Skip to content

ADX-202 Support Copying of Assets via Pipeline DSL#1040

Merged
nitinbhojwani merged 3 commits into
masterfrom
nitin/ADX-202-support-assets-copying
May 14, 2026
Merged

ADX-202 Support Copying of Assets via Pipeline DSL#1040
nitinbhojwani merged 3 commits into
masterfrom
nitin/ADX-202-support-assets-copying

Conversation

@nitinbhojwani
Copy link
Copy Markdown
Contributor

@nitinbhojwani nitinbhojwani commented May 14, 2026

Why

Pipeline DSL steps can depend on local helper files or directories, but generated step directories only included the generated pipeline_step.py and requirements.txt. That made asset-backed steps incomplete after generation because helper modules, text files, or directories referenced by the step were not packaged into the output. This change adds first-class asset support so generated step directories are self-contained, while also preventing assets from overwriting generated runtime files.

What

  • Add an assets parameter to the @step(...) decorator and StepDefinition.
  • Resolve step asset paths relative to the source file that defines the step.
  • Copy declared asset files and directories into the generated step version directory.
  • Validate assets before generation and fail fast when:
    • an asset path does not exist
    • two assets resolve to the same basename
    • an asset basename collides with a reserved generated filename such as pipeline_step.py
  • Update the pipeline DSL example to use a helper module copied as an asset.
  • Add regression coverage for:
    • copying file and directory assets into generated output
    • missing asset failures
    • reserved filename collisions
    • CLI generation of the real example including copied assets

How

  • In clarifai.runners.pipelines.step, thread a new assets field through the @step decorator into StepDefinition.
  • In clarifai.runners.pipelines.codegen:
    • add _resolve_step_assets(...) to normalize, validate, and deduplicate asset paths
    • add _copy_step_assets(...) to copy files with shutil.copy2(...) and directories with shutil.copytree(...)
    • validate assets before creating output files so failures happen early and clearly
    • copy assets into the step 1/ version directory after generating pipeline_step.py
    • reserve generated filenames like pipeline_step.py so user assets cannot overwrite generated runtime files
  • Update examples/pipeline_dsl_text_pipeline.py to load text normalization from examples/text_utils.py and declare that helper as a step asset.
  • Extend runner and CLI tests to assert copied assets are present in generated output and that invalid asset configurations raise the expected errors.

Tests

  • pytest tests/runners/test_pipeline_dsl.py tests/cli/test_pipeline_dsl_cli.py
  • Manually tried generating pipeline using python examples/pipeline_dsl_text_pipeline.py --generate gen-copy-assets-tst-pl
Screenshot 2026-05-14 at 4 05 22 PM

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds asset support to Pipeline DSL-generated step directories so helper files and directories can be packaged with generated pipeline steps.

Changes:

  • Adds an assets argument to @step(...) and StepDefinition.
  • Resolves, validates, and copies declared step assets into generated step version directories.
  • Updates the example pipeline and tests to exercise copied helper assets and validation failures.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
clarifai/runners/pipelines/codegen.py Adds asset validation and copying during step directory generation.
clarifai/runners/pipelines/step.py Threads the new assets field through step definitions.
examples/pipeline_dsl_text_pipeline.py Updates the example step to depend on a copied helper asset.
examples/text_utils.py Adds the helper module used by the example pipeline.
tests/cli/test_pipeline_dsl_cli.py Verifies CLI generation copies the example helper asset.
tests/runners/test_pipeline_dsl.py Extends Pipeline DSL tests for copied assets and validation errors.
tests/runners/sample_module.py Adds a helper asset used by runner tests.
tests/runners/pipeline_step.py Adds a reserved-name asset fixture.
tests/runners/invalid_reserved_asset_pipeline.py Adds a pipeline fixture for reserved asset-name validation.

Comment thread clarifai/runners/pipelines/codegen.py Outdated
Comment thread clarifai/runners/pipelines/codegen.py Outdated
Comment thread clarifai/runners/pipelines/codegen.py
Comment thread clarifai/runners/pipelines/codegen.py Outdated
- Use dirs_exist_ok=True in copytree for idempotent reruns
- Cache resolved assets list to avoid double resolution
- Handle os.PathLike alongside str for single-asset values
- Add regression test for duplicate asset basename

Agent-Logs-Url: https://github.com/Clarifai/clarifai-python/sessions/d76e1973-72c2-47e6-81ef-99bd10a4e6e3

Co-authored-by: nitinbhojwani <9331380+nitinbhojwani@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

Code Coverage

Package Line Rate Health
clarifai 45%
clarifai.cli 62%
clarifai.cli.templates 67%
clarifai.cli.templates.toolkits 100%
clarifai.client 64%
clarifai.client.auth 67%
clarifai.constants 100%
clarifai.datasets 100%
clarifai.datasets.export 69%
clarifai.datasets.upload 75%
clarifai.datasets.upload.loaders 37%
clarifai.models 100%
clarifai.rag 0%
clarifai.runners 52%
clarifai.runners.models 61%
clarifai.runners.pipeline_steps 45%
clarifai.runners.pipelines 78%
clarifai.runners.utils 61%
clarifai.runners.utils.data_types 72%
clarifai.schema 100%
clarifai.urls 56%
clarifai.utils 62%
clarifai.utils.evaluation 16%
clarifai.workflows 95%
examples 50%
Summary 61% (12575 / 20662)

Minimum allowed line rate is 50%

@nitinbhojwani nitinbhojwani enabled auto-merge (squash) May 14, 2026 10:39
@nitinbhojwani nitinbhojwani requested a review from rizzip May 14, 2026 10:41
Copy link
Copy Markdown
Contributor

@rizzip rizzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR produces the intended behavior.

DevX could be better w.r.t. making use of the assets. I tested the import asset path directly and it works. The only other DevX issue I see related to this PR is upward cascading asset dependencies.

def normalize_text(value: str) -> str:
"""Small helper intentionally kept outside the step for codegen extraction."""
return " ".join(value.strip().split())
import importlib.util
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a clean Dev experience and introduces a dependency between this helper and any calling steps (e.g., all steps using this helper must include the text_utils asset.

Two options:

  1. avoid use of importlib directly, and just do import text_utils. This is cleaner, but does not fix the cascading dependency issue
  2. declare dependencies on this function, and have the step compiler collect assets from called functions

@nitinbhojwani nitinbhojwani merged commit 8dbe72b into master May 14, 2026
12 checks passed
@nitinbhojwani nitinbhojwani deleted the nitin/ADX-202-support-assets-copying branch May 14, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants