📂 Data Mapping in Routines

This document explains how to use the input_mappings and output_mappings features in a routine definition to manage data flow into and out of jobs running on the platform.

These mappings allow jobs to interact with various types of storage to read inputs, share data between jobs in a jobflow, and write outputs.

🔧 Where to Define Mappings

Data mappings are specified using the conf parameter in your WKubeTask definition inside wkube.py:

python

task = WKubeTask(
    ...
    conf={
        "input_mappings": "<source1>:<destination1>;<source2>:<destination2>",
        "output_mappings": "<source3>:<destination3>"
    }
)

Each mapping is a string: "<source>:<destination>"
Multiple mappings are separated by semicolons (;)

🗂️ Types of Storage Locations

Three storage systems can be involved in mapping:

1. Accelerator Repository

Identified using the acc:// prefix (e.g., acc://mybucket/path/to/file.csv)
Persistent cloud storage for inputs and outputs
Accessible from both terminal and web interfaces

2. Container (Sandbox)

The isolated runtime environment where the routine runs
If using a base stack, your project folder is mounted to /code
Any mapped paths inside the container should point to /code or subpaths

3. Mounted Volume

Automatically mounted to /mnt/tmp inside the container
Shared between all jobs in a jobflow
Useful for passing intermediate data between jobs

🔄 Supported Mapping Directions

Input Mappings

From Accelerator Repository → Container
From Mounted Volume → Container

Output Mappings

From Container → Accelerator Repository
From Container → Mounted Volume

📌 Mappings must follow these directions or the platform will reject the configuration.

📁 File vs Folder Mappings

You can map both individual files and entire directories.

File-to-File Example:

python

"acc://mybucket/data.csv:/code/inputs/data.csv"

Folder-to-Folder Example:

python

"acc://mybucket/images/:/code/inputs/images/"

✅ For folder mappings, make sure both source and destination paths end with a / to indicate directory mapping.

⚠️ Performance Note: Avoid mapping directories with thousands of files — it may significantly slow down jobs.

🖼️ GUI-Based Special Keys

When launching a routine or jobflow from the platform’s web interface, you can use:

selected_files
selected_folders

These special keys allow users to interactively select files/folders from the Accelerator Repository.

Example mapping:

python

"selected_files:/code/inputs/"

The platform automatically converts these at runtime into proper acc:// mappings.

Data mappings are critical in pipelines to:

Ingest initial data from cloud storage
Share outputs between routines via /mnt/tmp
Persist final outputs back to the Accelerator Repository

Example:

python

conf = {
    "input_mappings": "acc://project/input.csv:/code/inputs/input.csv",
    "output_mappings": "/code/outputs/:/mnt/tmp/shared_output/"
}

Here, the routine pulls data from the cloud and makes its outputs available to downstream jobs through the mounted volume.

✅ Summary

Use input_mappings and output_mappings to control data flow.
Support for files, folders, and special GUI-driven keys (selected_files, selected_folders)
Carefully choose storage paths based on your workflow
Respect direction rules and performance recommendations

Data mappings give you control over where a job run from your routine can access and output data.

📂 Data Mapping in Routines ​

🔧 Where to Define Mappings ​

🗂️ Types of Storage Locations ​

1. Accelerator Repository ​

2. Container (Sandbox) ​

3. Mounted Volume ​

🔄 Supported Mapping Directions ​

Input Mappings ​

Output Mappings ​

📁 File vs Folder Mappings ​

File-to-File Example: ​

Folder-to-Folder Example: ​

🖼️ GUI-Based Special Keys ​

🧪 Use Case: Sharing Data in a Jobflow ​

Example: ​

✅ Summary ​

📂 Data Mapping in Routines

🔧 Where to Define Mappings

🗂️ Types of Storage Locations

1. Accelerator Repository

2. Container (Sandbox)

3. Mounted Volume

🔄 Supported Mapping Directions

Input Mappings

Output Mappings

📁 File vs Folder Mappings

File-to-File Example:

Folder-to-Folder Example:

🖼️ GUI-Based Special Keys

🧪 Use Case: Sharing Data in a Jobflow

Example:

✅ Summary