Custom Installation

Unlike the default installation, where all the Fyrefuse dependencies are provided together in the same Docker Compose with the simplest configuration possible, the custom installation assumes that you already have an Apache Spark and optionally Trino clusters set up along with an object storage and a metastore.

Prerequisites

On the machine or VM where Fyrefuse is deployed, you need the following:

  • RedHat / Debian based OS
  • 4 vCPUs
  • 8GB RAM
  • 160GB of Storage
  • Internet access
  • Docker

Additionally, you need the following services already configured and running:

  • Apache Spark cluster (Standalone or K8s)
  • (optional) Trino cluster
  • (optional) S3 compatible storage
  • (optional) Hive Metastore
  • (optional) PostgreSQL 14
  • (optional) GitLab (cloud or self-managed)

If you haven’t set up these components yet, follow their official documentation:

Before you start

Fyrefuse provides a Docker Compose configuration along with an env file for the initial configuration.

More specifically, it contains:

  • Fyrefuse Backend
  • Fyrefuse Frontend
  • Database
  • GitLab

Database Setup

Note: The database, powered by PostgreSQL@14, can be easily decoupled from the Docker Compose setup and installed separately using any preferred method. If you already have an existing PostgreSQL instance, you can configure Fyrefuse to use it.

If you are using the provided database image, no additional configuration is required.

If you're using an existing PostgreSQL instance, you must manually create two empty databases one for the Fyrefuse backend and one for the Hive Metastore along with a dedicated user with all privileges.

GitLab Setup

Note: Fyrefuse relies on Gitlab for integrating UDFs and Spark jobs. You can use the integrated gitlab image or gitlab cloud.

Create a new project and ensure the main branch exists; this is where Fyrefuse will store its files.

Generate a project access token with the following scopes:

  • api
  • read_repository
  • write_repository

Initial Configuration

Fyrefuse has a set of environment variables for configuring the access parameters and general settings of the external Fyrefuse services.

Warning: Change the environment variables only if you are sure of what you are doing!
Variable Name Description Default
BE_VERSION The Backend version to be deployed latest
FE_VERSION The Frontend version to be deployed latest
DATABASE_USER Specify the Fyrefuse database username fyrefuse_user
DATABASE_PASSWORD Specify the Fyrefuse database password
DATABASE_HOST Specify the Fyrefuse database hostname or ip localhost
DATABASE_PORT Specify the Fyrefuse database port 5432
DATABASE_NAME Specify the Fyrefuse database name fyrefuse_database
DATABASE_SCHEMA Specify the Fyrefuse database schema fyrefuse_schema
API_URL The base api url to be used by Fyrefuse frontend
API_LOGGER The endpoint to be used by Fyrefuse frontend to enable notifications
WS_URL The endpoint to be used by Fyrefuse frontend to enable Websockets
PIPELINE_LOGGER_LOGGER_URL The api url to be used by the FEM to send pipelines’ logs
PIPELINE_LOGGER_REPORT_URL The api url to be used by the FEM to send pipelines’ reports
GITLAB_PRIVATE_ACCESS_TOKEN Default access token for retrieving the FEM from gitlab (in case of spark standalone deployment engine)
FEM_BUILD_NAME FEM version that will be used for running the Fyrefuse’s pipelines
FEM_BUILD_LINK FEM build link
METASTORE_DB_NAME Specify the Metastore database name metastore
METASTORE_DB_SCHEMA Specify the Metastore database schema public
METASTORE_URI Specify the Metastore connection uri thrift://<metastore_hostname>
METASTORE_WAREHOUSE_DIR Specify the Metastore warehouse dir s3a://datalake/warehouse/
METASTORE_DB_CONNECTION_STRING Specify the Metastore database connection string jdbc:postgresql://localhost:5432/metastore
METASTORE_DB_DRIVER Specify the Metastore database driver org.postgresql.Driver
METASTORE_DB_USERNAME Specify the Metastore database username fyrefuse_user
METASTORE_DB_PASSWORD Specify the Metastore database password
DATALAYER_HTTP_MODE Specify the S3 object storage http mode http
DATALAYER_ENDPOINT Specify the S3 object storage endpoint
DATALAYER_ACCESS_KEY Specify the S3 object storage access key
DATALAYER_SECRET_KEY Specify the S3 object storage secret key

Installation Steps

1. Get the Docker Compose File

Download the Docker Compose file and navigate to its directory in the shell.

2. Customize the environment variables

Configure the environment variables according to your setup.

3. Login to Fyrefuse Registry

Login to Fyrefuse registry with your credentials to get the access to the Fyrefuse’s images.

docker login -u  <username> -p <personal_access_token> registry.gitlab.com

4. Setup the datalayer bucket on the S3 storage

Access your configured S3 object storage and create a bucket named "datalake". Inside this bucket, create a folder named "warehouse".

5. Run docker compose

Run the Docker Compose file:

docker compose up -d

Next Steps

Once the installation process is complete, you can start using Fyrefuse (available at localhost:8001) by creating your first project.

For instructions, see Create Your First Project.