What is Fyrefuse

Fyrefuse is a platform designed to build, orchestrate, run and observe data-intensive pipelines. Its powerful no-code/low-code web UI lets Data Engineers use Apache Spark interactively, to process real-time data streams, operationalize AI and serve data-driven applications, by using just SQL or Python. It is container-native and runs across all major cloud providers and on-premises installations. You can operate Fyresue in a self-managed deployment or with Kubernetes.

Fyrefuse can be understood through a few basic concepts. Familiarizing yourself with these will be convenient for learning to Use Fyrefuse (and its beautiful Web UI) and to start having fun building your data pipelines with it.

Environment

The architecture is based on Apache Spark, Trino and the Fyrefuse Python back-end; these three components provide the basic dependencies used by the data pipelines. Each data pipeline must be built against these dependencies that must be installed on the host system in order to be correctly designed and executed.

In summary, Apache Spark is used as the general-purpose data processing engine. Trino is there to support the Data Engineer while designing/debugging a pipeline thanks to its federated SQL query engine. Fyrefuse’s Python backend provides the programming logic to utilize and integrate these technologies.

The Environment relies a lot on containers in order to automatically install the required technologies from a YAML file, while their preference settings can be applied directly from the Web UI. You can deploy the Environment in a self-managed mode (i.e. on virtual machines or bare metal) as well as delegate their automation to Kubernetes.

Projects and Teams

Fyrefuse provides the ability to create isolated workspaces, each called a “Project”. A Project contains the resources assigned to its members that we call “Team”. A Team is a subset of the authorized Fyrefuse users who can only access the contents of their assigned Project.

Access to data and to pipelines has to be explicitly granted at user level. Access to other resources outside their Project, such as other data or other pipelines in another Project, is deliberately not possible unless a user is assigned to such other Project, hence, a user may be part of more than one Team.

For enterprise-level deployments, Fyrefuse provides integrations to separate external Identity Providers.

Data, Repositories and Layers

Fyrefuse uses “Data” as the umbrella term for the locations where data can be used in pipelines either as a source or a destination.

Repositories

Data in Fyrefuse can be connected for reading from and for writing to external locations that in Fyrefuse are called “Repositories” (i.e. a database, a file folder, a message broker, a SaaS software exposing a Rest API, etc.).

Layers

Fyrefuse also provides a fully managed data location called “Layer”. You can create as many Layers as you need. A Layer is representing a configurable partition on a standard S3 object storage, where data can be accessed in any of the most popular Open Table Formats (Apache Iceberg, Delta.io and soon, Hudi).

Data Exploration

Trino provides the run time engine to a component called “Data Explorer”, a lightning fast interface to explore the data available at source while designing a pipeline or to verify the correct execution of a pipeline at the destination, regardless of whether Data is stored on a Repository or on a Layer.

Pipelines, Jobs and their Execution

Pipelines

In Fyrefuse, Pipelines provide a structured way to combine multiple transformation tasks (i.e. for validation, SQL-like clauses, data masking, etc.) and modeling tasks (i.e. for ML, Deep Learning or GenAI), ensuring that data collected from one or more sources is processed consistently. The platform provides a user-friendly drag & drop web UI that allows to visually draw the entire workflow in a pipeline which you can then easily run on Apache Spark as a Spark application optimising its execution and assuring its performance.

Jobs

In Fyrefuse you can combine three categories of Jobs: No Code / Low Code Jobs which are backed by a simple web UI that helps you provide your own configurations in a simple way, you can Script your own code directly on Fyrefuse web IDE in either Python/PySpark or SQL to adapt a pipeline to your needs and finally, you can inject external code and use it in a pipeline.

Injecting external code in a pipeline is particularly useful when you require code or dependencies that aren’t Fyrefuse native, adding the maximum flexibility when developing a pipeline.Fyrefuse supports the most popular git services like GitLab, enabling users to store, manage, and seamlessly access their custom functions directly within the platform. Depending on the nature of the external code, its execution can be delegated to an external environment outside Fyrefuse as long as it exposes APIs for retrieving the results. For instance, an open-source GenAI model like Llama can run as a standalone process within the pipeline workspace or be deployed in an optimized external environment, even on cloud. Fyrefuse can then invoke it as needed, ensuring seamless integration and execution.

Executions

The pipeline executions in Fyrefuse are called “Instances”. As said, pipelines are executed on Apache Spark and with Fyrefuse you can schedule them on a time basis as well as on event basis. An Instance is basically a deployment of a Spark application. These objects are: a JSON file representing the pipeline workflow, the Fyrefuse library of Tasks and any external Task. All pushed-down to Apache Spark during the execution of an Instance.

Instances can be monitored in real-time and Fyrefuse presents a log of each execution and also a dashboard presenting the most important KPIs.

Next Steps

Now, you have understood the Basic Concepts, you can move on to installing the software on your own infrastructure or in any Cloud provider.

If Fyrefuse is already available to you, you may want to visit how to get started with your first pipeline or tour the user guide for more in depth knowledge on how Fyrefuse works