Glossary
This glossary provides clear definitions of key terms and concepts fundamental to understanding and effectively using the Fyrefuse platform.
Administration
Term |
Description |
Project |
Virtual work space enabling users from a data team to connect, manage and deliver enterprise data end to end by fostering automation and collaboration. Each Fyrefuse project is isolated, that means that it has its own team, data pipelines, metadata catalog, business glossary, data governance policies and dashboards. |
Team |
Unique set of users that have various roles and belong to a specific Project. Each team is linked to only one project, and each Project has only one Team. |
Permission |
An authorization given to a user with a specific role that enables them to access specific granular functionalities in Fyrefuse. The full list of permissions can be viewed here. |
Role |
In Fyrefuse it is defined by a customisable set of specific user permissions allowing a user to participate in the data workflow. By default, Fyrefuse standard user roles are Administrator, Data Manager, Data Owner and Team Leader. Additional and customised user roles can be created or edited through a permission management pane that lists over 70 atomised permissions. The Administrator role is not compatible with other roles, while other users can possess multiple roles at once. |
Data
Term |
Description |
Data Explorer |
Fyrefuse Data Explorer provides a UI to explore and observe data contained into repositories and layers. |
Query Builder |
Wizard that supports the users with the navigation of Repositories and Layers structures and the preparation of projection query only with point-and-click. |
Repository |
Repository is a Fyrefuse abstraction in which an aggregation of data is kept and maintained in an organised way. In Fyrefuse, a repository may be a DBMS, a Documents Folder (CSV, XML, Json etc.), Streaming Data (a Kafka queue, for example) or an App, interfaced via API. A repository needs to be configured using access parameters that allow Fyrefuse to connect and to import the data. |
Source type |
A property that defines the nature of a Repository, such as streaming, SQL databases, NoSQL databases, or filesystems. |
Technology |
A specific tool or platform designed to interact with a particular source type, such as Kafka for streaming, PostgreSQL for SQL databases, MongoDB for NoSQL databases, or Hadoop for filesystems. |
Entity |
Lightweight persistence domain object that defines an element within a Repository. For example, in a relational database an entity represents a table, and each entity attribute corresponds to a column in that table. |
Schema |
Structured blueprint that defines the data structure in a Repository’s entity. It specifies how data is stored, including attributes, data types and constraints. |
Attribute |
Attribute defines the information about the entity that needs to be stored. If the entity is a table called Employee, attributes could be the columns Name, Employee ID, Work location etc. An entity can have zero or more attributes, and each of those attributes apply only to that entity. Some Attribute Types are DATETIME, INTEGER, VARCHAR, NUMBER etc. |
Layers |
In Fyrefuse, layers refer to a method for organizing data in the Fyrefuse internal lakehouse within a flexible Medallion Architecture-like framework. They enable progressive improvement of data structure and quality as it transitions through different levels of the architecture. |
Pipeline
Term |
Description |
Pipeline |
Structured sequence of processes that moves data through various stages, from ingestion to transformation and storage. It ensures efficient data flow, quality improvement, and readiness for analysis, supporting the architecture's progressive refinement. |
Pipeline Template |
Configured pipeline created in the Pipeline Designer and saved to be (re)used or scheduled to run on a specific day or on a regular basis. |
Pipeline Designer |
User-friendly wizard to create and configure pipeline templates. It defines data sources and a target and may include a series of data transformation steps to be run on data before ingesting it into target. |
Instance |
Pipeline execution record documented in detail in order to keep track of Data Operations. Every instance has an ID and a status: Running, Succeeded, Stopped, Failed, or Scheduled. |
Jobs |
UDF or Custom step - data preparation and transformation steps, to be executed before delivering data into target Fyrefuse supports full-code custom jobs pulled from Job Stores. |
Data Source (Batch / Stream) |
- Data Source Batch enables users to ingest and process data in a batch mode.
- Data Source Stream provides the possibility to work with data streams that require immediate processing and analysis.
|
Data Target (Batch / Stream) |
- Data Target Batch allows users to segment data into distinct groups or batches based on predefined criteria, e.g.attributes.
- Data Target Stream focuses on specific data streams within Fyrefuse, allowing users to pinpoint and act upon real-time data as it flows into.
|
Low-code Jobs
Term |
Description |
SQL |
Enables users for querying data using Spark SQL and DataFrame APIs. |
Script |
Enables users to run any python code during the pipeline execution. |
Variable |
Serves to create a user-defined variable that can be stored in the fyrefuse's variables map. These variables act as placeholders and can be referenced within SQL blocks, simplifying data processing and analysis. |
No-code Jobs
Term |
Description |
Validation |
Applies predefined rules or conditions to data and filters out records that do not meet the specified criteria, ensuring the dataset's accuracy, consistency, and compliance with quality standards. |
Anonymization |
Applies anonymization techniques, such as masking, tokenization, or encryption, to protect sensitive information by removing or obscuring identifiable details while preserving the data's utility for analysis. |
External Jobs
Term |
Description |
UDF |
Module for registering a python User Defined Function (UDF) that allows users to apply non-native operations to DataFrame columns.It is applied to transform, manipulate, or enrich data based on specific business logic or requirements not covered by standard operations. |
Custom step |
Module for executing custom pyspark code within the full dataframe domain. Custom step’s scope includes all the existing pipeline’s dataframes and variables. |
SETTINGS
Term |
Description |
Job Store |
An integrated repository of a code-versioning platform (e.g. GitLab, BitBucket etc) from which custom full-code jobs in multiple programming languages can be imported into Fyrefuse to be used in pipelines. |
Engines |
Configuration panel for the Exploration and Processing engines’ connection and access parameters to allow Fyrefuse services to operate. |