✅ Data Validation in Accelerator
The Accelerator platform includes a powerful, built-in data validation system that ensures datasets conform to expected formats, standards, and quality rules before being used in computational workflows.
🧩 Supported Data Types
1. CSV Timeseries
- Tabular data representing values over time
- Required columns:
time,variable,value
2. Regional Timeseries
- Tabular data with spatial breakdown
- Required columns:
region,variable,value,time
3. Raster Timeseries
- Spatial datasets (e.g., GeoTIFF) representing time-indexed grids
- One file per timestep, with appropriate metadata (CRS, nodata)
Note: Vector datasets (e.g., polygons) are not directly validated. They are typically used as supporting spatial layers (e.g., via GeoJSON or PMTiles) and integrated within routines that consume regional timeseries.
🔍 Validation Layers
🧱 Type Validation (Built-in)
Each data type includes a set of core validation rules:
- File type and format checks
- Structural column requirements (for CSV)
- Metadata validation (for raster)
⚙️ Custom Validation Rules
Users can define additional JSON-based rule sets to validate content-specific expectations:
- Required or allowed variable names
- Value ranges (e.g., temperatures between -50 and 60)
- Allowed units or categories
- Logical checks (e.g., monotonic time, no missing values)
These rules enhance quality control and enforce domain-specific standards.
📦 Validation Schemas
Validation rules can be bundled into schemas — reusable JSON templates registered on the platform.
- Each schema has a unique identifier
- Can be referenced in:
- Routines
- Pipelines
- Manual dataset validation processes
Benefits
- Reuse: Apply the same validation across multiple datasets
- Share: Collaborate with teams using common standards
- Enforce: Automate checks before workflows consume data
💡 Use Cases
🔁 Harmonizing Datasets
- Apply validation schemas to datasets from different sources
- Standardize structure before ingestion
- Improve interoperability across workflows
🔄 Reusable Computational Modules
- Declare validation schema requirements in a routine
- Ensure routines only accept datasets with expected shape
- Avoid hidden data assumptions, simplify reuse
🔗 Pluggable Workflows
- Define dataset requirements as part of routine metadata
- Allow upstream producers to align to schema
- Enable modular, composable data pipelines
🛠️ Integration
Validation can be triggered as part of:
- Routine execution
- Manual validation tool
- Dataset registration step
Routines like
Regional Timeseries Validatoruse this feature automatically.
🧭 Summary
| Feature | Description |
|---|---|
| Built-in Type Checks | Validate CSV and raster structure |
| User-Defined Rules | Create schemas with custom constraints |
| Schema Identifiers | Reuse validation rules across datasets and workflows |
| Integrated with Routines | Compatible with data loading and validation flows |
| Optional Vector Use | Vector data is not validated directly, but used to interpret region fields |
Data validation is not just about integrity — it's the foundation for **trustworthy, reproducible, and modular workflows ** on the Accelerator platform.