✅ Data Validation in Accelerator

The Accelerator platform includes a powerful, built-in data validation system that ensures datasets conform to expected formats, standards, and quality rules before being used in computational workflows.

🧩 Supported Data Types

1. CSV Timeseries

Tabular data representing values over time
Required columns: time, variable, value

2. Regional Timeseries

Tabular data with spatial breakdown
Required columns: region, variable, value, time

3. Raster Timeseries

Spatial datasets (e.g., GeoTIFF) representing time-indexed grids
One file per timestep, with appropriate metadata (CRS, nodata)

Note: Vector datasets (e.g., polygons) are not directly validated. They are typically used as supporting spatial layers (e.g., via GeoJSON or PMTiles) and integrated within routines that consume regional timeseries.

🔍 Validation Layers

🧱 Type Validation (Built-in)

Each data type includes a set of core validation rules:

File type and format checks
Structural column requirements (for CSV)
Metadata validation (for raster)

⚙️ Custom Validation Rules

Users can define additional JSON-based rule sets to validate content-specific expectations:

Required or allowed variable names
Value ranges (e.g., temperatures between -50 and 60)
Allowed units or categories
Logical checks (e.g., monotonic time, no missing values)

These rules enhance quality control and enforce domain-specific standards.

📦 Validation Schemas

Validation rules can be bundled into schemas — reusable JSON templates registered on the platform.

Each schema has a unique identifier
Can be referenced in:
- Routines
- Pipelines
- Manual dataset validation processes

Benefits

Reuse: Apply the same validation across multiple datasets
Share: Collaborate with teams using common standards
Enforce: Automate checks before workflows consume data

💡 Use Cases

🔁 Harmonizing Datasets

Apply validation schemas to datasets from different sources
Standardize structure before ingestion
Improve interoperability across workflows

🔄 Reusable Computational Modules

Declare validation schema requirements in a routine
Ensure routines only accept datasets with expected shape
Avoid hidden data assumptions, simplify reuse

🔗 Pluggable Workflows

Define dataset requirements as part of routine metadata
Allow upstream producers to align to schema
Enable modular, composable data pipelines

🛠️ Integration

Validation can be triggered as part of:
- Routine execution
- Manual validation tool
- Dataset registration step
Routines like Regional Timeseries Validator use this feature automatically.

🧭 Summary

Feature	Description
Built-in Type Checks	Validate CSV and raster structure
User-Defined Rules	Create schemas with custom constraints
Schema Identifiers	Reuse validation rules across datasets and workflows
Integrated with Routines	Compatible with data loading and validation flows
Optional Vector Use	Vector data is not validated directly, but used to interpret region fields

Data validation is not just about integrity — it's the foundation for **trustworthy, reproducible, and modular workflows ** on the Accelerator platform.

✅ Data Validation in Accelerator ​

🧩 Supported Data Types ​

1. CSV Timeseries ​

2. Regional Timeseries ​

3. Raster Timeseries ​

🔍 Validation Layers ​

🧱 Type Validation (Built-in) ​

⚙️ Custom Validation Rules ​

📦 Validation Schemas ​

Benefits ​

💡 Use Cases ​

🔁 Harmonizing Datasets ​

🔄 Reusable Computational Modules ​

🔗 Pluggable Workflows ​

🛠️ Integration ​

🧭 Summary ​