Skip to content

🚀 Introduction to the Accelerator Platform

The Accelerator platform is an integrated system for managing, orchestrating, validating, versioning, and visualizing scientific computations and data workflows — at scale.

It is designed to serve scientific communities, data engineers, and modeling teams who need to:

  • Develop and run complex jobflows
  • Validate and transform datasets
  • Combine models into larger workflows
  • Share reproducible computation with collaborators and the public
  • Manage data versioning alongside code
  • Provide interactive services like RStudio, APIs, and notebooks

Accelerator provides a modern alternative to older systems such as:

  • Batch computing schedulers (HTCondor, SLURM, PBS)
  • Pure Kubernetes-based setups (which require significant manual wiring)
  • Data pipeline tools (Airflow, Luigi) that lack built-in data validation, visualization, or scientific focus
  • Ad hoc scripting and one-off compute setups

It bridges the gap between workflow orchestration and scientific data lifecycle management.


🎯 Key Capabilities

🧱 Routines and Jobflow

  • Core abstraction is a Routine → a unit of computation, packaged with its environment
  • Routines can be run independently as Jobs or composed into Jobflows ( acyclic pipelines)
  • Jobflows support a hierarchical structure (group nodes / spanner nodes)

📦 Code and Dependency Management

  • Routines can take code, configuration, and stacks from:
  • Supports:
    • Pre-defined base stacks (managed environment)
    • User-provided container files (automatically built)
    • Pre-built container images

🔄 Data Mapping

  • Built-in data mapping system:
    • Between cloud storage (acc://)
    • Jobflow volume (/mnt/pipe)
    • Routine container

✅ Data Validation

  • Schema-driven data validation out of the box
  • Supports:
    • CSV Timeseries
    • Regional Timeseries
    • Raster Timeseries
  • Templates managed via UI and API
  • Validation rules, fully configurable

📊 Data Visualization

  • Built-in support for:
    • Bar, stacked bar, line charts
    • High-resolution raster explorer
  • Embeddable widgets for external sites
  • Designed for scientific collaboration and review

🔄 Data Versioning

  • Built-in Git + DVC-based versioning
  • Push outputs of jobs or jobflows to version control
  • Supports reproducibility and trust
  • Extensible to other version control systems

🌐 Hosted Models and Interactive Services

  • Routines and Jobflows can be hosted as public or private
  • Support for interactive routines (long-running services):
    • RStudio server
    • SSH sessions
    • Custom APIs
  • Managed through web UI

🖥️ UI and CLI

  • Full-featured Web GUI:
    • Create and manage routines
    • Build jobflows visually
    • View logs, outputs, validation results
  • Powerful CLI (Command Line Interface):
    • Dispatch routines as jobs and jobsflows
    • Inspect jobs
    • Manage data mappings

📝 Reproducibility and Auditability

  • All runs are logged with:
    • Exact code version (Git hash, container image)
    • Exact data inputs (mapped paths, DVC version if used)
    • Parameter configuration
    • Logs and outputs
  • Designed to meet reproducibility requirements of scientific publishing and open science.

🛠️ Extensibility

  • Users can define:

    • New routines (Python, R, GAMS, Julia, etc.)
    • New jobflows
    • New data validation schemas
    • New visualization components
    • New versioning adapters
  • Accelerator is designed as a framework — not a black box.


🚀 Advantages Over Existing Systems

Accelerator vs...Key Differences
HTCondor / SLURM / PBSBuilt-in data validation, visualization, versioning, UI
Airflow / LuigiScientific data focus, data mapping, validation, reproducibility
Pure Kubernetes setupsHigher-level abstractions, no need for manual operator creation
Ad hoc scripting (bash, Python)Formalized, reusable, auditable workflows
Classic data lakesActive compute & jobflows, not just passive storage

Accelerator unifies what would otherwise require multiple disconnected tools.


🌍 Who Is It For?

  • Scientists running computational experiments and jobflows
  • Modelers building multi-stage models (agriculture, climate, biodiversity, etc.)
  • Data engineers supporting data jobflows with validation and versioning
  • Collaborative scientific projects needing reproducibility and transparency
  • Institutions hosting shared scientific models and datasets

🧬 Architectural Philosophy

  • Routine is the universal unit of computation
  • Data is a first-class citizen:
    • Validated
    • Versioned
    • Visualized
  • Workflows are modular and reproducible
  • Hosting enables sharing research, and integrated processing
  • Extensibility is fundamental → you are not locked in

🚀 Summary

Accelerator is a modern, flexible platform for:

  • Research computation orchestration
  • Data validation and engineering
  • Data versioning and visualization
  • Reproducible, shareable jobflows
  • Interactive services on compute clusters

By integrating these capabilities in one system, Accelerator helps research communities move faster, with greater trust and collaboration.


📚 Next Steps

Welcome to the Accelerator ecosystem! 🚀