🚀 Introduction to the Accelerator Platform
The Accelerator platform is an integrated system for managing, orchestrating, validating, versioning, and visualizing scientific computations and data workflows — at scale.
It is designed to serve scientific communities, data engineers, and modeling teams who need to:
- Develop and run complex jobflows
- Validate and transform datasets
- Combine models into larger workflows
- Share reproducible computation with collaborators and the public
- Manage data versioning alongside code
- Provide interactive services like RStudio, APIs, and notebooks
Accelerator provides a modern alternative to older systems such as:
- Batch computing schedulers (HTCondor, SLURM, PBS)
- Pure Kubernetes-based setups (which require significant manual wiring)
- Data pipeline tools (Airflow, Luigi) that lack built-in data validation, visualization, or scientific focus
- Ad hoc scripting and one-off compute setups
It bridges the gap between workflow orchestration and scientific data lifecycle management.
🎯 Key Capabilities
🧱 Routines and Jobflow
- Core abstraction is a Routine → a unit of computation, packaged with its environment
- Routines can be run independently as Jobs or composed into Jobflows ( acyclic pipelines)
- Jobflows support a hierarchical structure (group nodes / spanner nodes)
📦 Code and Dependency Management
- Routines can take code, configuration, and stacks from:
- Local folders
- Git repositories
- Container images
- Supports:
- Pre-defined base stacks (managed environment)
- User-provided container files (automatically built)
- Pre-built container images
🔄 Data Mapping
- Built-in data mapping system:
- Between cloud storage (acc://)
- Jobflow volume (/mnt/pipe)
- Routine container
✅ Data Validation
- Schema-driven data validation out of the box
- Supports:
- CSV Timeseries
- Regional Timeseries
- Raster Timeseries
- Templates managed via UI and API
- Validation rules, fully configurable
📊 Data Visualization
- Built-in support for:
- Bar, stacked bar, line charts
- High-resolution raster explorer
- Embeddable widgets for external sites
- Designed for scientific collaboration and review
🔄 Data Versioning
- Built-in Git + DVC-based versioning
- Push outputs of jobs or jobflows to version control
- Supports reproducibility and trust
- Extensible to other version control systems
🌐 Hosted Models and Interactive Services
- Routines and Jobflows can be hosted as public or private
- Support for interactive routines (long-running services):
- RStudio server
- SSH sessions
- Custom APIs
- Managed through web UI
🖥️ UI and CLI
- Full-featured Web GUI:
- Create and manage routines
- Build jobflows visually
- View logs, outputs, validation results
- Powerful CLI (Command Line Interface):
- Dispatch routines as jobs and jobsflows
- Inspect jobs
- Manage data mappings
📝 Reproducibility and Auditability
- All runs are logged with:
- Exact code version (Git hash, container image)
- Exact data inputs (mapped paths, DVC version if used)
- Parameter configuration
- Logs and outputs
- Designed to meet reproducibility requirements of scientific publishing and open science.
🛠️ Extensibility
Users can define:
- New routines (Python, R, GAMS, Julia, etc.)
- New jobflows
- New data validation schemas
- New visualization components
- New versioning adapters
Accelerator is designed as a framework — not a black box.
🚀 Advantages Over Existing Systems
| Accelerator vs... | Key Differences |
|---|---|
| HTCondor / SLURM / PBS | Built-in data validation, visualization, versioning, UI |
| Airflow / Luigi | Scientific data focus, data mapping, validation, reproducibility |
| Pure Kubernetes setups | Higher-level abstractions, no need for manual operator creation |
| Ad hoc scripting (bash, Python) | Formalized, reusable, auditable workflows |
| Classic data lakes | Active compute & jobflows, not just passive storage |
Accelerator unifies what would otherwise require multiple disconnected tools.
🌍 Who Is It For?
- Scientists running computational experiments and jobflows
- Modelers building multi-stage models (agriculture, climate, biodiversity, etc.)
- Data engineers supporting data jobflows with validation and versioning
- Collaborative scientific projects needing reproducibility and transparency
- Institutions hosting shared scientific models and datasets
🧬 Architectural Philosophy
- Routine is the universal unit of computation
- Data is a first-class citizen:
- Validated
- Versioned
- Visualized
- Workflows are modular and reproducible
- Hosting enables sharing research, and integrated processing
- Extensibility is fundamental → you are not locked in
🚀 Summary
Accelerator is a modern, flexible platform for:
- Research computation orchestration
- Data validation and engineering
- Data versioning and visualization
- Reproducible, shareable jobflows
- Interactive services on compute clusters
By integrating these capabilities in one system, Accelerator helps research communities move faster, with greater trust and collaboration.
📚 Next Steps
- Getting Started with Routine
- Understanding Jobflows
- Data Mapping Guide
- Data Validation Guide
- Data Versioning Use Case
Welcome to the Accelerator ecosystem! 🚀