What is Fyrefuse?

Fyrefuse is a data processing platform designed for data teams. Whether the project is in Cloud, on prem or on any hybrid setup and whether it requires real-time data delivery for Analytics, operationalizing AI, or self-service data provision to build applications at scale, Fyrefuse empowers data teams with the agility to deliver innovation easily and to start generating data-driven results in minutes, not weeks.

alt_text

Data engineers and data scientists can design reusable data processing pipelines with native code-free jobs or import their own code, written in multiple programming languages to process data in batches, micro-batches or streams. Pipelines can be run absolutely fast, unleashing the power of Apache Spark on a managed Kubernetes cluster

Glossary

Administation

Project is a virtual work space enabling users from a data team to connect, manage and deliver enterprise data end to end by fostering automation and collaboration. Each Fyrefuse project has its own team, data pipelines, metadata catalog, business glossary, data governance policies and dashboards.

Team is a unique set of users that have various roles and belong to a specific Project. Each team is linked to only one project, and each Project has only one Team.

Permission is an authorization given to a user with a specific role that enables them to access specific granular functionalities in Fyrefuse. The full list of permissions can be viewed here.

Role in Fyrefuse is defined by a customisable set of specific user permissions allowing a user to participate in the data workflow. By default, Fyrefuse standard user roles are Administrator, Data Manager, Data Owner and Team Leader. Additional and customised user roles can be created or edited through a permission management pane that lists over 70 atomised permissions. The Administrator role is not compatible with other roles, while other users can possess multiple roles at once.

Data Owner is a user who shares access to the data cores or their fractions with Fyrefuse populating the Data Explorer with metadata. E.g. a Database Administrator etc.

Data Manager is a user who organises metadata and enriches it with business meaning creating a common Business Glossary, defines which data governance policies should apply to data and builds data pipelines to ingest data into a target system. E.g. a Data Engineer, an Architect, a Data Steward etc.

Data Consumer is a user who navigates the metadata catalog adding datasets of interest to the Data Request, which is then forwarded to the Data Manager for approval.

Team Leader is a user who manages team members by adding an unlimited number of users to the team, or excluding some.

Administrator is a user responsible for creating Projects, assigning a Team Leader to a Project, creating Roles and managing user permissions in existing roles.

Enabling user authorises an existing user to log into Fyrefuse Portal using their valid credentials (username and password).

Disabling user prevents an existing user from accessing Fyrefuse Portal and logging in under their credentials.

Data Cores

Data core is a logical grouping of repositories - which may be scattered across the infrastructure layer - relating to a specific enterprise data domain, for example Customer Relationship Management (CRM), Project Management Tools, Marketing Automation, Enterprise Resource Planning (ERP) etc.

Repository is an object in which an aggregation of data is kept and maintained in an organised way. In Fyrefuse, a repository may be a DBMS, a Documents Folder (CSV, XML, Json etc.), Streaming Data (a Kafka queue, for example) or an App, interfaced via API. A repository needs to be configured using access parameters that allow Fyrefuse to connect and to import the metadata into Data Catalog.

Data Connection in Fyrefuse consists of three alternative ways to import the metadata from a configured repository in order to populate the Data Explorer:

  1. Autodiscovery
  2. Importing template
  3. a)Importing a database schema
    b)Importing Fyrefuse standard template

Autodiscovery is a functionality of Fyrefuse that reads the schema of a connected repository and selects all the available entities to be visualised in the Data Explorer.

Importing a database schema allows users to populate the Data Explorer with entities and their attributes by uploading a standard DB schema of the repository.

Importing Fyrefuse standard template allows users to populate the Data Explorer with entities and their attributes by manually filling in and uploading a standard .xlsx template downloadable from the Portal. The Data Owner decided whether to make the whole repository visible in the Data explorer or share just selected tables.

Technology is a parameter indicating a specific technology of the repository such as PostgreSQL, REST API, Kafka, SFTP, MySQL etc.

Model is a parameter indicating a broader type of repository technology such as SQL, NoSQL, JSON etc.

Data Explorer

Data Explorer is a section of Fyrefuse that showcases available data from all sources by organising metadata in two alternative technical and business user-friendly catalogs, Data Catalog and Business Glossary, respectively. Data Explorer is strictly project-related so different projects with different teams may have different explorers.

Data Explorer Dashboard is a visual summary of all entities in Data Explorer by data cores, technology, model, and accessibility levels.

Data Catalog

Data Catalog is a section of Fyrefuse providing positional exploration of the metadata contained in the data sources. Data Catalog organises metadata in a hierarchical structure according to the source of data provenance: the position of each attribute is defined in a technical hierarchy of attribute / entity / repository / data core. Data Catalog is populated automatically once data cores are created, and repositories are connected and configured. It is a tool designed for Data Consumers with a technical understanding of data schemas.

Data core is a logical grouping of repositories - which may be scattered across the infrastructure layer - relating to a specific enterprise data domain, for example Customer Relationship Management (CRM), Project Management Tools, Marketing Automation, Enterprise Resource Planning (ERP) etc.

Repository is an object in which an aggregation of data is kept and maintained in an organised way. In Fyrefuse, a repository may be a DBMS, a Documents Folder (CSV, XML, Json etc.), Streaming Data (a Kafka queue, for example) or an App, interfaced via API. A repository needs to be configured using access parameters that allow Fyrefuse to connect and to import the metadata into Data Catalog.

Entity is a lightweight persistence domain object. For example, in a relational database an entity represents a table, and each entity attribute corresponds to a column in that table.

Attribute defines the information about the entity that needs to be stored. If the entity is a table called Employee, attributes could be the columns Name, Employee ID, Work location etc. An entity can have zero or more attributes, and each of those attributes apply only to that entity. Some Attribute Types are DATETIME, INTEGER, VARCHAR, NUMBER etc.

Path in Data Catalog defines the position of each attribute as attribute / entity / repository / data core.

Business Glossary

Business Glossary is a section of Fyrefuse that offers a semantic view on the metadata by using keywords. Organised in a semantic context, entities fall under a hierarchy of business terms, making data more comprehensible. The Business Glossary is designed for Data Consumers looking for specific business-driven insights.

Keyword is a meaningful business term used to define a specific data domain. Keywords can be structured in a hierarchy to define sub-domains at a more granular level.

Path in Business Glossary is the logical sequence of the keywords leading to the entity that contains the attribute. Thus, the position of each attribute is defined in a hierarchy of attribute / entity / Keyword Z / Keyword n / Keyword A.

Root keyword (root node) is the primary leftmost keyword linked with links only to zero or more child keywords.

Parent keyword (parent node) is a keyword that has one or more child keywords.

Child keyword (child node) is a keyword linked to the parent keyword. All keywords have exactly one parent, except the root keywords, which have none.

Pipelines

Pipelines is a core set of functionalities allowing Data Managers to make ingestions, i.e. to move structured and unstructured data data from one or more data sources to a data target in batches and streams.

Pipeline Template is a configured pipeline created in the Pipeline Designer and saved to be (re)used or scheduled to run on a specific day or on a regular basis. A Pipeline Template can be created either ad-hoc or in association with an approved Data Request.

Pipeline Designer is a user-friendly five-step wizard to create and configure pipeline templates. It defines data sources and a target and may include a series of data transformation steps (jobs) to be run on data before ingesting it into target.

Job data preparation and transformation steps, to be executed before delivering data into target. Fyrefuse supports full-code custom jobs pulled from Job Stores, as well as native Spark jobs embedded in Fyrefuse.

Instance is a record of pipeline execution documented in detail in order to keep track of Data Operations. Every instance has an ID and a status: Running, Succeeded, Stopped, Failed, or Scheduled.

Pipeline Dataset is a set of attributes selected from Data Explorer to be ingested via a pipeline.

Settings

Broker is a software tool (e.g. Kafka) that enables applications, systems, and services to communicate with each other and exchange data.

Job Store is an integrated repository of a code-versioning platform (e.g. GitLab, BitBucket etc) from which custom full-code jobs in multiple programming languages can be imported into Fyrefuse to be used in pipelines.

Notifications

Standard notification is a notification that does not include updates of information already present in the frontend, but appears in the dedicated section

Notification with update is a notification that provides for updates of information already present in the frontend and appears in the dedicated section.

Silent notification is a notification that does not appear in the dedicated section, but provides for an update of information already present in the frontend. Unlike other notifications, it does not include the notification field.