🚀 Introduction to the Accelerator Platform

The Accelerator platform is an integrated system for managing, orchestrating, validating, versioning, and visualizing scientific computations and data workflows — at scale.

It is designed to serve scientific communities, data engineers, and modeling teams who need to:

Develop and run complex jobflows
Validate and transform datasets
Combine models into larger workflows
Share reproducible computation with collaborators and the public
Manage data versioning alongside code
Provide interactive services like RStudio, APIs, and notebooks

Accelerator provides a modern alternative to older systems such as:

Batch computing schedulers (HTCondor, SLURM, PBS)
Pure Kubernetes-based setups (which require significant manual wiring)
Data pipeline tools (Airflow, Luigi) that lack built-in data validation, visualization, or scientific focus
Ad hoc scripting and one-off compute setups

It bridges the gap between workflow orchestration and scientific data lifecycle management.

🎯 Key Capabilities

🧱 Routines and Jobflow

Core abstraction is a Routine → a unit of computation, packaged with its environment
Routines can be run independently as Jobs or composed into Jobflows ( acyclic pipelines)
Jobflows support a hierarchical structure (group nodes / spanner nodes)

📦 Code and Dependency Management

Routines can take code, configuration, and stacks from:
- Local folders
- Git repositories
- Container images
Supports:
- Pre-defined base stacks (managed environment)
- User-provided container files (automatically built)
- Pre-built container images

🔄 Data Mapping

Built-in data mapping system:
- Between cloud storage (acc://)
- Jobflow volume (/mnt/tmp)
- Routine container

✅ Data Validation

Schema-driven data validation out of the box
Supports:
- CSV Timeseries
- Regional Timeseries
- Raster Timeseries
Templates managed via UI and API
Validation rules, fully configurable

📊 Data Visualization

Built-in support for:
- Bar, stacked bar, line charts
- High-resolution raster explorer
Embeddable widgets for external sites
Designed for scientific collaboration and review

🔄 Data Versioning

Built-in Git + DVC-based versioning
Push outputs of jobs or jobflows to version control
Supports reproducibility and trust
Extensible to other version control systems

🌐 Hosted Models and Interactive Services

Routines and Jobflows can be hosted as public or private
Support for interactive routines (long-running services):
- RStudio server
- SSH sessions
- Custom APIs
Managed through web UI

🖥️ UI and CLI

Full-featured Web GUI:
- Create and manage routines
- Build jobflows visually
- View logs, outputs, validation results
Powerful CLI (Command Line Interface):
- Dispatch routines as jobs and jobsflows
- Inspect jobs
- Manage data mappings

📝 Reproducibility and Auditability

All runs are logged with:
- Exact code version (Git hash, container image)
- Exact data inputs (mapped paths, DVC version if used)
- Parameter configuration
- Logs and outputs
Designed to meet reproducibility requirements of scientific publishing and open science.

🛠️ Extensibility

Users can define:
- New routines (Python, R, GAMS, Julia, etc.)
- New jobflows
- New data validation schemas
- New visualization components
- New versioning adapters
Accelerator is designed as a framework — not a black box.

🚀 Advantages Over Existing Systems

Accelerator vs...	Key Differences
HTCondor / SLURM / PBS	Built-in data validation, visualization, versioning, UI
Airflow / Luigi	Scientific data focus, data mapping, validation, reproducibility
Pure Kubernetes setups	Higher-level abstractions, no need for manual operator creation
Ad hoc scripting (bash, Python)	Formalized, reusable, auditable workflows
Classic data lakes	Active compute & jobflows, not just passive storage

Accelerator unifies what would otherwise require multiple disconnected tools.

🌍 Who Is It For?

Scientists running computational experiments and jobflows
Modelers building multi-stage models (agriculture, climate, biodiversity, etc.)
Data engineers supporting data jobflows with validation and versioning
Collaborative scientific projects needing reproducibility and transparency
Institutions hosting shared scientific models and datasets

🧬 Architectural Philosophy

Routine is the universal unit of computation
Data is a first-class citizen:
- Validated
- Versioned
- Visualized
Workflows are modular and reproducible
Hosting enables sharing research, and integrated processing
Extensibility is fundamental → you are not locked in

🚀 Summary

Accelerator is a modern, flexible platform for:

Research computation orchestration
Data validation and engineering
Data versioning and visualization
Reproducible, shareable jobflows
Interactive services on compute clusters

By integrating these capabilities in one system, Accelerator helps research communities move faster, with greater trust and collaboration.

📚 Next Steps

Welcome to the Accelerator ecosystem! 🚀

🚀 Introduction to the Accelerator Platform ​

🎯 Key Capabilities ​

🧱 Routines and Jobflow ​

📦 Code and Dependency Management ​

🔄 Data Mapping ​

✅ Data Validation ​

📊 Data Visualization ​

🔄 Data Versioning ​

🌐 Hosted Models and Interactive Services ​

🖥️ UI and CLI ​

📝 Reproducibility and Auditability ​

🛠️ Extensibility ​

🚀 Advantages Over Existing Systems ​

🌍 Who Is It For? ​

🧬 Architectural Philosophy ​

🚀 Summary ​

📚 Next Steps ​