GenAI changes to support EPContext compilation and validation#1993
GenAI changes to support EPContext compilation and validation#1993lnigam wants to merge 7 commits intomicrosoft:mainfrom
Conversation
- New model_compile example with 5 configs and timings; default EP path. - common.cpp: RegisterEP via OgaRegisterExecutionProviderLibrary so EP is on same OrtEnv as Model::ValidateCompiledModel (model.cpp). ValidateCompiledModel uses ONNX Runtime GetCompatibilityInfoFromModel, GetEpDevices(ort_env), GetModelCompatibilityForEpDevices; ort_env must have plugin registered. - model.cpp: Compile reuses existing compiled model only if file exists and ValidateCompiledModel passes; otherwise compiles. - CMake: MODEL_COMPILE target and option.
…p alignemnt with onnxruntime params
|
@microsoft-github-policy-service agree company="Nvidia" |
| std::cout << "Registering execution provider: " << ep_path << std::endl; | ||
| auto env = Ort::Env(); | ||
| // Must register on GenAI's OrtEnv (via OgaRegisterExecutionProviderLibrary) so | ||
| // GetEpDevices() in ValidateCompiledModel sees the plugin; Ort::Env() is a different env. |
There was a problem hiding this comment.
This seems to be an ongoing issue as there are some cases where registering on the Ort::Env() is preferred and other cases where registering on GenAI's OrtEnv is preferred. We should find a way to consolidate.
| // Licensed under the MIT License. | ||
| // | ||
| // Model Compile example: runs the same model under different EP and compile configurations | ||
| // (CPU, CPU+overlay, NvTensorRtRtx no-compile / 4 options / all options). Use -v for verbose, |
There was a problem hiding this comment.
Besides appending some GenAI config options and using the overlay API, it seems to be the same logic as the existing examples.
Instead of creating a new standalone example that needs to be continually maintained, let's integrate the necessary logic into the common files. We should update the model-qa and/or model-chat examples in C/C++, C#, and Python to show this capability. In this way, these changes will be continually tested since the model-qa examples are built and tested in the CIs now.
We want to keep the examples consistent across language bindings and reduce maintenance by sharing and re-using logic as much as possible between examples.
| Gpt_Model::Gpt_Model(std::unique_ptr<Config> config, OrtEnv& ort_env) | ||
| : Model{std::move(config)} { | ||
| session_decoder_ = CreateSession(ort_env, config_->model.decoder.filename, session_options_.get()); | ||
| std::string decoder_model_path = CompileModel(ort_env, config_->model.decoder.filename, session_options_.get(), true, config_->model.decoder.compile_options); |
There was a problem hiding this comment.
Can we simplify this by passing the config_->model.decoder object directly? Each object should have a filename and compile options to access. Then, the lines won't be as long and there will be one less parameter.
|
|
||
| Ort::Allocator& alloc = Ort::Allocator::GetWithDefaultOptions(); | ||
| char* compat_info = nullptr; | ||
| OrtStatus* st = Ort::api->GetCompatibilityInfoFromModel( |
There was a problem hiding this comment.
Let's prefer using the ORT GenAI API wrappers over calling Ort::api->MethodName directly.
|
|
||
| bool Model::CheckCompiledModelExists(OrtEnv& ort_env, | ||
| const std::string& model_filename, | ||
| const Config::CompileOptions& compile_options_config, |
There was a problem hiding this comment.
Let's fix the formatting of the parameters
| const std::string& model_filename, | ||
| const Config::CompileOptions& compile_options_config, | ||
| fs::path& out_compiled_model_path) { | ||
| if (compile_options_config.ep_context_file_path.has_value() && !compile_options_config.ep_context_file_path.value().empty()) { |
There was a problem hiding this comment.
Could we use .value_or here instead to simplify obtaining the value?
| } | ||
|
|
||
| const auto& comp_opts = compile_options.value(); | ||
| if (!comp_opts.enable_ep_context.has_value() || !comp_opts.enable_ep_context.value()) { |
There was a problem hiding this comment.
Can these three if blocks be merged into one since they all have the same return value?
| } | ||
|
|
||
| const auto& comp_opts = config_compilation_options.value(); | ||
| if (!comp_opts.enable_ep_context.has_value() || !comp_opts.enable_ep_context.value()) { |
There was a problem hiding this comment.
Same question here for the several if blocks here
| // Ensure the output directory exists | ||
| fs::path output_dir = compiled_model_path.parent_path(); | ||
| if (!fs::exists(output_dir)) { | ||
| if (!fs::create_directories(output_dir)) { |
There was a problem hiding this comment.
Can we merge these two if conditions into one?
| } | ||
|
|
||
| // Compile the model | ||
| Ort::CompileModel(ort_env, *compilation_options); |
There was a problem hiding this comment.
I know there's a lot of checks above but what happens if model compilation fails here? Could compiled_model_path point to a non-existent model if a failure occurs?
| } | ||
|
|
||
| // Helper lambda to configure and compile a model | ||
| auto compile_model_helper = [this, &ort_env](OrtModelCompilationOptions* compilation_options, |
There was a problem hiding this comment.
Given how big this helper is, I think it should be its own standalone method.
| // Additionally, compile all pipeline models that have compile_options (if primary session option) | ||
| // Use explicit pipeline session_options when present, otherwise fallback to main session_options_ | ||
| // (consistent with GetSessionOptions() at runtime). | ||
| if (is_primary_session_option) { |
There was a problem hiding this comment.
Can you explain more on why this boolean is needed? Why can't we always go through each pipeline model and check its options?
| } | ||
|
|
||
| std::unique_ptr<OrtSession> Model::CreateSession(OrtEnv& ort_env, const std::string& model_path, OrtSessionOptions* session_options) { | ||
|
|
| * \param force_compile_if_needed If true, PREFER_RECOMPILATION is treated as invalid (recompile); if false, it is accepted as valid with a warning | ||
| * \return true if the compiled model is valid for the current EP (or validation not applicable) | ||
| */ | ||
| bool ValidateCompiledModel(OrtEnv& ort_env, const fs::path& compiled_model_path, bool force_compile_if_needed); |
There was a problem hiding this comment.
| bool ValidateCompiledModel(OrtEnv& ort_env, const fs::path& compiled_model_path, bool force_compile_if_needed); | |
| bool ValidateCompiledModel(OrtEnv& ort_env, const fs::path& compiled_model_path, bool force_compile); |
| * \param env OrtEnv object | ||
| * \param model_compilation_options Compilation options for the model | ||
| */ | ||
| void CompileModel(OrtEnv& env, const OrtModelCompilationOptions& model_compilation_options); |
There was a problem hiding this comment.
| void CompileModel(OrtEnv& env, const OrtModelCompilationOptions& model_compilation_options); | |
| void CompileModel(OrtEnv& env, const OrtModelCompilationOptions& compilation_options); |
| * | ||
| * Wraps OrtCompileApi::ModelCompilationOptions_SetOutputModelExternalInitializersFile | ||
| */ | ||
| OrtModelCompilationOptions& SetOutputModelExternalInitializersFile(const ORTCHAR_T* external_initializers_file_path, |
There was a problem hiding this comment.
Looks like there's some formatting issues with some of the newly added parameters in this header file.
| } | ||
|
|
||
| inline OrtModelCompilationOptions& OrtModelCompilationOptions::SetInputModelFromBuffer(const void* input_model_data, | ||
| size_t input_model_data_size) { |
There was a problem hiding this comment.
Looks like there's some formatting issues with some of the newly added parameters in this file.
| v_.ep_context_file_path = JSON::Get<std::string_view>(value); | ||
| } else if (name == "ep_context_embed_mode") { | ||
| v_.ep_context_embed_mode = JSON::Get<bool>(value); | ||
| } else if (name == "flags") { |
There was a problem hiding this comment.
What is the flags option? The name doesn't provide much information.
There was a problem hiding this comment.
Pull request overview
This PR adds first-class support in GenAI for compiling ONNX models into EPContext-enabled “context models” (and validating/reusing existing compiled artifacts) so subsequent loads can skip recompilation and graph optimization overhead.
Changes:
- Introduces
compile_optionsingenai_config.jsonparsing and plumbing to drive EPContext compilation behavior. - Adds ORT Compile API wrappers (
GetCompileApi,OrtModelCompilationOptions,CompileModel) and integrates compilation/validation into model session creation flows. - Adds a new C example (
model_compile) and build option to demonstrate compilation behavior and timing.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/models/model.h | Adds compilation/validation APIs and state for tracking compiled pipeline model paths. |
| src/models/model.cpp | Implements EPContext compile/validate/reuse flow and updates session creation to accept explicit model paths. |
| src/models/onnxruntime_api.h | Adds GetCompileApi and public declarations for compile helpers/types. |
| src/models/onnxruntime_inline.h | Implements inline wrappers for compile API and OrtModelCompilationOptions methods. |
| src/config.h | Adds Config::CompileOptions and attaches it to relevant model config sections. |
| src/config.cpp | Implements JSON parsing for compile_options across model components and pipeline entries. |
| src/models/decoder_only.cpp | Uses compiled-or-original model path from Model::CompileModel when creating decoder session. |
| src/models/gpt.cpp | Uses compiled-or-original model path from Model::CompileModel when creating decoder session. |
| src/models/whisper.cpp | Uses compile path selection for encoder/decoder models. |
| src/models/marian.cpp | Uses compile path selection for encoder/decoder models. |
| src/models/multi_modal.cpp | Uses compile path selection for vision/speech/embedding/decoder sessions. |
| src/models/decoder_only_pipeline.cpp | Attempts to use compiled pipeline model paths when creating pipeline sessions. |
| src/filesystem.h | Extends fs::path (filename extraction, const operator/, recursive directory creation). |
| examples/c/src/common.cpp | Registers plugin EPs via GenAI’s env so EP device enumeration/validation can see them. |
| examples/c/src/model_compile.cpp | New example showcasing EPContext compile vs reuse behavior and timing. |
| examples/c/CMakeLists.txt | Adds MODEL_COMPILE option and target. |
| : Model{std::move(config)} { | ||
| encoder_session_options_ = OrtSessionOptions::Create(); | ||
| CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, true, false); | ||
| CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, false, false); |
There was a problem hiding this comment.
Same issue as in whisper.cpp: passing is_primary_session_options=false into CreateSessionOptionsFromConfig can lead to provider duplication (providers list is already derived from provider_options) and also bypasses the multi-provider device mismatch check because SetProviderSessionOptions won’t return a device for non-primary sessions. Suggest reverting this argument to true for encoder session options (or adjust SetProviderSessionOptions to avoid duplicates and still validate device consistency).
| CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, false, false); | |
| CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, true, false); |
| speech_session_options_ = OrtSessionOptions::Create(); | ||
| CreateSessionOptionsFromConfig(config_->model.speech.session_options.has_value() ? config_->model.speech.session_options.value() : config_->model.decoder.session_options, *speech_session_options_, true, true); | ||
| speech_session_ = CreateSession(ort_env, config_->model.speech.filename, speech_session_options_.get()); | ||
| CreateSessionOptionsFromConfig(config_->model.speech.session_options.has_value() ? config_->model.speech.session_options.value() : config_->model.decoder.session_options, *speech_session_options_, false, true); |
There was a problem hiding this comment.
Same concern for the speech session options: is_primary_session_options=false can cause duplicate EP appends and prevents the non-decoder session from contributing to the “single provider/device” validation. Recommend keeping this as true (or dedupe + return a device type even for non-primary sessions).
| CreateSessionOptionsFromConfig(config_->model.speech.session_options.has_value() ? config_->model.speech.session_options.value() : config_->model.decoder.session_options, *speech_session_options_, false, true); | |
| CreateSessionOptionsFromConfig(config_->model.speech.session_options.has_value() ? config_->model.speech.session_options.value() : config_->model.decoder.session_options, *speech_session_options_, true, true); |
|
|
||
| embedding_session_options_ = OrtSessionOptions::Create(); | ||
| CreateSessionOptionsFromConfig(config_->model.embedding.session_options.has_value() ? config_->model.embedding.session_options.value() : config_->model.decoder.session_options, *embedding_session_options_, true, true); | ||
| CreateSessionOptionsFromConfig(config_->model.embedding.session_options.has_value() ? config_->model.embedding.session_options.value() : config_->model.decoder.session_options, *embedding_session_options_, false, true); |
There was a problem hiding this comment.
Same issue for the embedding session options: passing is_primary_session_options=false can duplicate providers (providers list is already populated from provider_options) and can bypass the multi-provider guard because the device type isn’t returned for non-primary sessions. Consider using true here as well, or adjust SetProviderSessionOptions accordingly.
| CreateSessionOptionsFromConfig(config_->model.embedding.session_options.has_value() ? config_->model.embedding.session_options.value() : config_->model.decoder.session_options, *embedding_session_options_, false, true); | |
| CreateSessionOptionsFromConfig(config_->model.embedding.session_options.has_value() ? config_->model.embedding.session_options.value() : config_->model.decoder.session_options, *embedding_session_options_, true, true); |
| // Set external initializers file | ||
| if (comp_opts.external_initializers_file_path.has_value() && | ||
| comp_opts.external_initializers_size_threshold.has_value()) { | ||
| fs::path external_init_path = config_->config_path / comp_opts.external_initializers_file_path.value(); |
There was a problem hiding this comment.
Similar to ep_context_file_path: external_initializers_file_path is always joined with config_->config_path. If callers provide an absolute path, this will generate an incorrect path and likely fail compilation. Consider respecting absolute paths (only prefix when the config value is relative).
| fs::path external_init_path = config_->config_path / comp_opts.external_initializers_file_path.value(); | |
| fs::path external_init_path = comp_opts.external_initializers_file_path.value(); | |
| if (!external_init_path.is_absolute()) { | |
| external_init_path = config_->config_path / external_init_path; | |
| } |
| inline bool create_directories(const path& p) { | ||
| #ifdef _WIN32 | ||
| // On Windows, create directory recursively using CreateDirectoryW | ||
| if (p.exists()) { | ||
| return true; // Already exists | ||
| } | ||
|
|
||
| // First create parent directory if needed | ||
| path parent = p.parent_path(); | ||
| if (!parent.string().empty() && !parent.exists()) { | ||
| if (!create_directories(parent)) { | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| // Create the directory | ||
| return CreateDirectoryW(p.c_str(), nullptr) != 0 || GetLastError() == ERROR_ALREADY_EXISTS; |
There was a problem hiding this comment.
create_directories() returns true when p.exists() is true, but it doesn’t verify that the existing path is actually a directory. If the path exists as a file, this will incorrectly report success and downstream code will fail with a less actionable error. Consider checking p.is_directory() (or equivalent) and returning false/throwing when the existing path is not a directory.
| // Get the compiled model path if it was compiled, otherwise use full path from config + filename | ||
| std::string model_path = GetPipelineCompiledModelPath(model.model_id); | ||
| if (model_path.empty()) { | ||
| // Use full path to original model if not compiled | ||
| model_path = (config_->config_path / fs::path(model.filename)).string(); | ||
| } | ||
| sessions_.emplace_back(CreateSession(ort_env, model_path, GetSessionOptions(model.model_id))); |
There was a problem hiding this comment.
GetPipelineCompiledModelPath() is only populated by Model::CompileModel(), but DecoderOnlyPipelineModel never calls CompileModel for its pipeline stages (it only creates sessions directly). As a result, this will always fall back to the original model files and any decoder.pipeline[*].compile_options in config won’t take effect for decoder-pipeline models. Consider compiling each pipeline stage here (or calling a shared compile step before session creation) when compile_options are enabled.
| : Model{std::move(config)} { | ||
| encoder_session_options_ = OrtSessionOptions::Create(); | ||
| CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, true, false); | ||
| CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, false, false); |
There was a problem hiding this comment.
The third parameter to CreateSessionOptionsFromConfig was changed to false here. For encoder/decoder configs, Config::SessionOptions.providers is already populated from provider_options (see config.cpp), and SetProviderSessionOptions appends provider_options again when is_primary_session_options is false—so this can duplicate providers and append the same EP multiple times. It also prevents the encoder session from participating in the device consistency check (session_device stays null), potentially allowing mixed providers without throwing. Consider keeping is_primary_session_options as true for these non-pipeline sessions (or deduplicate providers + still return a device type for validation).
| CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, false, false); | |
| CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, true, false); |
| vision_session_options_ = OrtSessionOptions::Create(); | ||
| CreateSessionOptionsFromConfig(config_->model.vision.session_options.has_value() ? config_->model.vision.session_options.value() : config_->model.decoder.session_options, *vision_session_options_, true, true); | ||
| vision_session_ = CreateSession(ort_env, config_->model.vision.filename, vision_session_options_.get()); | ||
| CreateSessionOptionsFromConfig(config_->model.vision.session_options.has_value() ? config_->model.vision.session_options.value() : config_->model.decoder.session_options, *vision_session_options_, false, true); | ||
| std::string vision_model_path = CompileModel(ort_env, config_->model.vision.filename, vision_session_options_.get(), false, config_->model.vision.compile_options); |
There was a problem hiding this comment.
Using is_primary_session_options=false here risks appending the same execution provider more than once (because Config::SessionOptions.providers is already populated from provider_options) and can skip the provider/device consistency check for the vision session. Consider using is_primary_session_options=true (or deduplicating providers inside SetProviderSessionOptions and still returning a device type for validation).
| fs::path& out_compiled_model_path) { | ||
| if (compile_options_config.ep_context_file_path.has_value() && !compile_options_config.ep_context_file_path.value().empty()) { | ||
| // Single path: full path (relative to config path) including filename, e.g. "contexts/model_ctx.onnx" | ||
| out_compiled_model_path = config_->config_path / compile_options_config.ep_context_file_path.value(); |
There was a problem hiding this comment.
ep_context_file_path is documented as a “full path (relative to config path)”, but the code always prefixes it with config_->config_path. If a user supplies an absolute path, config_path / absolute_path will produce an invalid joined path with this project’s fs::path implementation. Consider handling absolute paths explicitly (use the provided path as-is when it’s not relative).
| out_compiled_model_path = config_->config_path / compile_options_config.ep_context_file_path.value(); | |
| fs::path ep_context_path(compile_options_config.ep_context_file_path.value()); | |
| if (ep_context_path.is_absolute()) { | |
| // Use absolute paths as provided. | |
| out_compiled_model_path = ep_context_path; | |
| } else { | |
| // Resolve relative paths against the config path. | |
| out_compiled_model_path = config_->config_path / ep_context_path; | |
| } |
-Add Support for EPContext compilation by default, If enabled in the config
-Once EPContext is compiled, GenAI loads the context model instead of original model and avoids model recompilation and graph optimization.
-Also added support for API which can validate the existing EPContext. It can also check if the existing EPContext may not run optimal and requires recompilation.