Skip to content

πŸ“‚ Data Mapping in Routines ​

This document explains how to use the input_mappings and output_mappings features in a routine definition to manage data flow into and out of jobs running on the platform.

These mappings allow jobs to interact with various types of storage to read inputs, share data between jobs in a jobflow, and write outputs.


πŸ”§ Where to Define Mappings ​

Data mappings are specified using the conf parameter in your WKubeTask definition inside wkube.py:

python
task = WKubeTask(
    ...
    conf={
        "input_mappings": "<source1>:<destination1>;<source2>:<destination2>",
        "output_mappings": "<source3>:<destination3>"
    }
)
  • Each mapping is a string: "<source>:<destination>"
  • Multiple mappings are separated by semicolons (;)

πŸ—‚οΈ Types of Storage Locations ​

Three storage systems can be involved in mapping:

1. Accelerator Repository ​

  • Identified using the acc:// prefix (e.g., acc://mybucket/path/to/file.csv)
  • Persistent cloud storage for inputs and outputs
  • Accessible from both terminal and web interfaces

2. Container (Sandbox) ​

  • The isolated runtime environment where the routine runs
  • If using a base stack, your project folder is mounted to /code
  • Any mapped paths inside the container should point to /code or subpaths

3. Mounted Volume ​

  • Automatically mounted to /mnt/pipe inside the container
  • Shared between all jobs in a jobflow
  • Useful for passing intermediate data between jobs

πŸ”„ Supported Mapping Directions ​

Input Mappings ​

  • From Accelerator Repository β†’ Container
  • From Mounted Volume β†’ Container

Output Mappings ​

  • From Container β†’ Accelerator Repository
  • From Container β†’ Mounted Volume

πŸ“Œ Mappings must follow these directions or the platform will reject the configuration.


πŸ“ File vs Folder Mappings ​

You can map both individual files and entire directories.

File-to-File Example: ​

python
"acc://mybucket/data.csv:/code/inputs/data.csv"

Folder-to-Folder Example: ​

python
"acc://mybucket/images/:/code/inputs/images/"

βœ… For folder mappings, make sure both source and destination paths end with a / to indicate directory mapping.

⚠️ Performance Note: Avoid mapping directories with thousands of files β€” it may significantly slow down jobs.


πŸ–ΌοΈ GUI-Based Special Keys ​

When launching a routine or jobflow from the platform’s web interface, you can use:

  • selected_files
  • selected_folders

These special keys allow users to interactively select files/folders from the Accelerator Repository.

Example mapping:

python
"selected_files:/code/inputs/"

The platform automatically converts these at runtime into proper acc:// mappings.


πŸ§ͺ Use Case: Sharing Data in a Jobflow ​

Data mappings are critical in pipelines to:

  • Ingest initial data from cloud storage
  • Share outputs between routines via /mnt/pipe
  • Persist final outputs back to the Accelerator Repository

Example: ​

python
conf = {
    "input_mappings": "acc://project/input.csv:/code/inputs/input.csv",
    "output_mappings": "/code/outputs/:/mnt/pipe/shared_output/"
}

Here, the routine pulls data from the cloud and makes its outputs available to downstream jobs through the mounted volume.


βœ… Summary ​

  • Use input_mappings and output_mappings to control data flow.
  • Support for files, folders, and special GUI-driven keys (selected_files, selected_folders)
  • Carefully choose storage paths based on your workflow
  • Respect direction rules and performance recommendations

Data mappings give you control over where a job run from your routine can access and output data.