[QAIRT] Implement QAIRT ORT->Genie workflow#2358
[QAIRT] Implement QAIRT ORT->Genie workflow#2358qti-kromero wants to merge 15 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
5badb8e to
fdaea52
Compare
|
dev testing complete and unit testing added - pending reviewers |
|
@jambayk @xiaoyu-work would it be possible to get a reviewer added to this |
| return { | ||
| "backend": PassConfigParam( | ||
| type_=str, | ||
| default_value="CPU", |
There was a problem hiding this comment.
the PR mostly looks good to me but could you use an enum with the options for backend and log_level like in
Olive/olive/passes/pytorch/rotate.py
Line 40 in 85a754a
this gives it automatic validation of the allowed values. thanks!
There was a problem hiding this comment.
Pull request overview
Adds a new QAIRT pipeline to Olive to support an end-to-end ORT → QAIRT (Genie) workflow, including model preparation, QAIRT GenAIBuilder compilation, and ONNX encapsulation for onnxruntime-genai compatibility.
Changes:
- Introduces three new QAIRT passes: preparation (external script runner), GenAIBuilder (CPU/HTP backends), and encapsulation (EPContext ONNX wrapper + genai_config.json).
- Adds QAIRT model handlers and new Framework/ModelFileFormat enums for QAIRT artifacts.
- Adds a new pytest suite for the QAIRT passes and updates Olive’s pass registry configuration (
olive_config.json) with QAIRT entries and extra dependencies.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
olive/passes/qairt/preparation.py |
New pass to run external preparation scripts and emit QairtPreparedModelHandler. |
olive/passes/qairt/gen_ai_builder.py |
New pass to compile/build QAIRT artifacts via QAIRT GenAIBuilder for CPU/HTP targets. |
olive/passes/qairt/encapsulation.py |
New pass to export DLC + wrap it into an ONNX EPContext model and generate genai_config.json. |
olive/passes/qairt/__init__.py |
Adds QAIRT passes package. |
olive/model/handler/qairt.py |
Adds QAIRT model handler types (QairtPreparedModelHandler, QairtModelHandler). |
olive/model/handler/__init__.py |
Exposes the new QAIRT model handlers from the handler package. |
olive/constants.py |
Adds QAIRT framework and QAIRT model file formats. |
olive/olive_config.json |
Registers new QAIRT passes and adds QAIRT extra dependency mapping. |
test/passes/qairt/conftest.py |
Adds shared fixtures for mocking QAIRT modules and model handlers. |
test/passes/qairt/test_preparation.py |
Unit tests for QairtPreparation behavior and subprocess streaming. |
test/passes/qairt/test_gen_ai_builder.py |
Unit tests for GenAIBuilder CPU/HTP behavior and validation paths. |
test/passes/qairt/test_encapsulation.py |
Unit tests for encapsulation, DLC discovery, ONNX generation, and genai_config creation. |
| # Can only set target and transformation configurations if the BE is HTP | ||
| if config.backend == qairt.BackendType.HTP.value: | ||
| # Device configs | ||
| gen_ai_builder.set_targets([config.soc_details]) |
There was a problem hiding this comment.
For HTP, set_targets([config.soc_details]) is called unconditionally, but soc_details defaults to None and validate_config doesn’t enforce it. Passing [None] into the QAIRT API is likely to fail at runtime. Add a validation/guard so HTP requires a non-empty soc_details (or skip set_targets when it’s unset and rely on QAIRT defaults).
| gen_ai_builder.set_targets([config.soc_details]) | |
| if config.soc_details: | |
| gen_ai_builder.set_targets([config.soc_details]) |
| "log_level": PassConfigParam( | ||
| type_=str, | ||
| default_value=None, | ||
| description="Log level to be used within underlying QAIRT components." | ||
| "Valid values: DEBUG, INFO, WARN, ERROR.", | ||
| ), | ||
| "run_checker": PassConfigParam( | ||
| type_=bool, | ||
| default_value=False, | ||
| description="Runs the onnx checker on the model before it is encapsulated.", | ||
| ), |
There was a problem hiding this comment.
The log_level config option is defined but never used in this pass. Either wire it up (e.g., set the same QAIRT_LOG_LEVEL env var used in QairtGenAIBuilder) or remove it to avoid confusing users with a no-op parameter.
| genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get( | ||
| "num_attention_heads", -1 | ||
| ) | ||
| genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1) |
There was a problem hiding this comment.
head_size is computed as hidden_size // num_attention_heads using default values of -1 when keys are missing. This can silently produce incorrect values (e.g., -1 // -1 == 1) or raise if num_attention_heads is 0/non-int. Mirror the safer logic used in olive/passes/openvino/ov_utils.py:create_genai_config by validating both values are positive ints before dividing; otherwise set head_size to -1.
| genai_config["model"]["decoder"]["head_size"] = src_config.get("hidden_size", -1) // src_config.get( | |
| "num_attention_heads", -1 | |
| ) | |
| genai_config["model"]["decoder"]["hidden_size"] = src_config.get("hidden_size", -1) | |
| hidden_size = src_config.get("hidden_size", -1) | |
| num_attention_heads = src_config.get("num_attention_heads", -1) | |
| head_size = -1 | |
| if isinstance(hidden_size, int) and isinstance(num_attention_heads, int) and hidden_size > 0 and num_attention_heads > 0: | |
| head_size = hidden_size // num_attention_heads | |
| genai_config["model"]["decoder"]["head_size"] = head_size | |
| genai_config["model"]["decoder"]["hidden_size"] = hidden_size |
| with subprocess.Popen( | ||
| ["python", str(script_path), "--config", config_file_path], | ||
| cwd=str(script_path.parent), |
There was a problem hiding this comment.
The subprocess is invoked via the literal executable name "python", which can pick up a different interpreter than the one running Olive (e.g., in venv/conda). Use sys.executable (or equivalent) so the preparation script runs under the same Python environment and has access to the same installed dependencies.
| @@ -0,0 +1,391 @@ | |||
| # ------------------------------------------------------------------------- | |||
Check warning
Code scanning / lintrunner
RUFF/format Warning test
|
|
||
| def test_preparation_uses_sys_executable_and_env(tmp_path, mock_hf_model, mock_qairt_modules): | ||
| """Test that subprocess uses sys.executable and passes environment.""" | ||
| import os |
Check warning
Code scanning / lintrunner
PYLINT/W0611 Warning test
|
|
||
| def test_preparation_uses_sys_executable_and_env(tmp_path, mock_hf_model, mock_qairt_modules): | ||
| """Test that subprocess uses sys.executable and passes environment.""" | ||
| import os |
Check warning
Code scanning / lintrunner
RUFF/F401 Warning test
Describe your changes
Implements a complete QAIRT workflow for converting ONNX Runtime models to Genie-compatible format through three new passes:
QairtPreparation: Executes external preparation scripts to quantize and prepare HuggingFace models for QAIRT, with configurable caching and script parameters.
QairtGenAIBuilder: Converts prepared models using QAIRT GenAIBuilder API with support for:
QairtEncapsulation: Wraps QAIRT DLC models in ONNX protobuf format with EPContext nodes, generating genai_config.json for onnxruntime-genai compatibility.
This enables end-to-end optimization of generative AI models for Qualcomm hardware accelerators.
Checklist before requesting a review
lintrunner -a(Optional) Issue link