Scystream

The Scystream project is an open-source data-science pipeline toolkit containing all necessary tools to create and execute data-science workflows.

Using an easy-to-use frontend, users can schedule and deploy custom workflows consisting of different data-processing tasks.

Architecture

Short Description

The frontend is a Next.js application that communicates with the backend (“core”) via HTTP. Authentication and authorization are handled through Keycloak.

The backend is built with FastAPI and consists of two primary services:

Workflow Service

Responsible for all workflow-related logic, including:

project creation and management
adding and configuring compute blocks
starting and stopping workflows
workflow orchestration

The workflow service integrates with Apache Airflow, which is responsible for scheduling and executing compute blocks.

Superset Service

Handles integration with Apache Superset, including:

dashboard configuration
linking dashboards to workflows and projects

Compute blocks are implemented using the scystream-sdk.

Each compute block is packaged as a Docker container and includes a cbc.yaml file that defines:

configuration options
expected inputs
produced outputs

Workflows can be described declaratively using the project's Template Schema (see the corresponding template repository on GitLab for more details).

The system uses three primary data sources:

core-postgres

Stores all application-related metadata and state required by the core platform.

data-postgres

Stores structured workflow and compute data processed by compute blocks.

data-minio

Object storage used for files and larger datasets accessed by compute blocks.

Compute blocks can read from and write to both data-postgres and data-minio during execution.

Quickstart

It is recommended to use Docker and Docker Compose.

Docker

To start all services, run the following command in the project root directory:

docker compose -f docker-compose.dev.yaml up -d

You might be required to setup the keycloak environment correctly.

For development, run the frontend and backend locally:

npm run dev

uvicorn main:app --reload

Please make sure to configure the front- & backend correctly using corresponding .env files for them.

Working with Compute Blocks

Compute Blocks, when pulled initially, are stored within core/repos/. For development purposes, when changes are made to compute blocks, you should also pull these changes into your core/repos/ directory (Dont forget to update the image, using the correct tag (e.g. pr-14).

The Airflow Container uses the docker-images downloaded to your own device. Make sure to keep them up to date accordingly.

Name		Name	Last commit message	Last commit date
Latest commit History 701 Commits
.assets		.assets
.github/workflows		.github/workflows
.keycloak-config		.keycloak-config
core		core
frontend		frontend
mail-template/test		mail-template/test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scystream

Architecture

Short Description

Workflow Service

Superset Service

core-postgres

data-postgres

data-minio

Quickstart

Docker

Working with Compute Blocks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scystream

Architecture

Short Description

Workflow Service

Superset Service

core-postgres

data-postgres

data-minio

Quickstart

Docker

Working with Compute Blocks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages