Skip to content

GenAI changes to support EPContext compilation and validation#1993

Open
lnigam wants to merge 7 commits intomicrosoft:mainfrom
lnigam:lnigam/feature_EPContext
Open

GenAI changes to support EPContext compilation and validation#1993
lnigam wants to merge 7 commits intomicrosoft:mainfrom
lnigam:lnigam/feature_EPContext

Conversation

@lnigam
Copy link
Copy Markdown
Contributor

@lnigam lnigam commented Feb 27, 2026

-Add Support for EPContext compilation by default, If enabled in the config
-Once EPContext is compiled, GenAI loads the context model instead of original model and avoids model recompilation and graph optimization.
-Also added support for API which can validate the existing EPContext. It can also check if the existing EPContext may not run optimal and requires recompilation.

- New model_compile example with 5 configs and timings; default EP path.

- common.cpp: RegisterEP via OgaRegisterExecutionProviderLibrary so EP is on same OrtEnv as Model::ValidateCompiledModel (model.cpp). ValidateCompiledModel uses ONNX Runtime GetCompatibilityInfoFromModel, GetEpDevices(ort_env), GetModelCompatibilityForEpDevices; ort_env must have plugin registered.

- model.cpp: Compile reuses existing compiled model only if file exists and ValidateCompiledModel passes; otherwise compiles.

- CMake: MODEL_COMPILE target and option.
@lnigam
Copy link
Copy Markdown
Contributor Author

lnigam commented Feb 27, 2026

@microsoft-github-policy-service agree company="Nvidia"

std::cout << "Registering execution provider: " << ep_path << std::endl;
auto env = Ort::Env();
// Must register on GenAI's OrtEnv (via OgaRegisterExecutionProviderLibrary) so
// GetEpDevices() in ValidateCompiledModel sees the plugin; Ort::Env() is a different env.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an ongoing issue as there are some cases where registering on the Ort::Env() is preferred and other cases where registering on GenAI's OrtEnv is preferred. We should find a way to consolidate.

// Licensed under the MIT License.
//
// Model Compile example: runs the same model under different EP and compile configurations
// (CPU, CPU+overlay, NvTensorRtRtx no-compile / 4 options / all options). Use -v for verbose,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides appending some GenAI config options and using the overlay API, it seems to be the same logic as the existing examples.

Instead of creating a new standalone example that needs to be continually maintained, let's integrate the necessary logic into the common files. We should update the model-qa and/or model-chat examples in C/C++, C#, and Python to show this capability. In this way, these changes will be continually tested since the model-qa examples are built and tested in the CIs now.

We want to keep the examples consistent across language bindings and reduce maintenance by sharing and re-using logic as much as possible between examples.

Gpt_Model::Gpt_Model(std::unique_ptr<Config> config, OrtEnv& ort_env)
: Model{std::move(config)} {
session_decoder_ = CreateSession(ort_env, config_->model.decoder.filename, session_options_.get());
std::string decoder_model_path = CompileModel(ort_env, config_->model.decoder.filename, session_options_.get(), true, config_->model.decoder.compile_options);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this by passing the config_->model.decoder object directly? Each object should have a filename and compile options to access. Then, the lines won't be as long and there will be one less parameter.


Ort::Allocator& alloc = Ort::Allocator::GetWithDefaultOptions();
char* compat_info = nullptr;
OrtStatus* st = Ort::api->GetCompatibilityInfoFromModel(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's prefer using the ORT GenAI API wrappers over calling Ort::api->MethodName directly.


bool Model::CheckCompiledModelExists(OrtEnv& ort_env,
const std::string& model_filename,
const Config::CompileOptions& compile_options_config,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix the formatting of the parameters

const std::string& model_filename,
const Config::CompileOptions& compile_options_config,
fs::path& out_compiled_model_path) {
if (compile_options_config.ep_context_file_path.has_value() && !compile_options_config.ep_context_file_path.value().empty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use .value_or here instead to simplify obtaining the value?

}

const auto& comp_opts = compile_options.value();
if (!comp_opts.enable_ep_context.has_value() || !comp_opts.enable_ep_context.value()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these three if blocks be merged into one since they all have the same return value?

}

const auto& comp_opts = config_compilation_options.value();
if (!comp_opts.enable_ep_context.has_value() || !comp_opts.enable_ep_context.value()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here for the several if blocks here

// Ensure the output directory exists
fs::path output_dir = compiled_model_path.parent_path();
if (!fs::exists(output_dir)) {
if (!fs::create_directories(output_dir)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge these two if conditions into one?

}

// Compile the model
Ort::CompileModel(ort_env, *compilation_options);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's a lot of checks above but what happens if model compilation fails here? Could compiled_model_path point to a non-existent model if a failure occurs?

}

// Helper lambda to configure and compile a model
auto compile_model_helper = [this, &ort_env](OrtModelCompilationOptions* compilation_options,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how big this helper is, I think it should be its own standalone method.

// Additionally, compile all pipeline models that have compile_options (if primary session option)
// Use explicit pipeline session_options when present, otherwise fallback to main session_options_
// (consistent with GetSessionOptions() at runtime).
if (is_primary_session_option) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more on why this boolean is needed? Why can't we always go through each pipeline model and check its options?

}

std::unique_ptr<OrtSession> Model::CreateSession(OrtEnv& ort_env, const std::string& model_path, OrtSessionOptions* session_options) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

* \param force_compile_if_needed If true, PREFER_RECOMPILATION is treated as invalid (recompile); if false, it is accepted as valid with a warning
* \return true if the compiled model is valid for the current EP (or validation not applicable)
*/
bool ValidateCompiledModel(OrtEnv& ort_env, const fs::path& compiled_model_path, bool force_compile_if_needed);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool ValidateCompiledModel(OrtEnv& ort_env, const fs::path& compiled_model_path, bool force_compile_if_needed);
bool ValidateCompiledModel(OrtEnv& ort_env, const fs::path& compiled_model_path, bool force_compile);

* \param env OrtEnv object
* \param model_compilation_options Compilation options for the model
*/
void CompileModel(OrtEnv& env, const OrtModelCompilationOptions& model_compilation_options);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void CompileModel(OrtEnv& env, const OrtModelCompilationOptions& model_compilation_options);
void CompileModel(OrtEnv& env, const OrtModelCompilationOptions& compilation_options);

*
* Wraps OrtCompileApi::ModelCompilationOptions_SetOutputModelExternalInitializersFile
*/
OrtModelCompilationOptions& SetOutputModelExternalInitializersFile(const ORTCHAR_T* external_initializers_file_path,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's some formatting issues with some of the newly added parameters in this header file.

}

inline OrtModelCompilationOptions& OrtModelCompilationOptions::SetInputModelFromBuffer(const void* input_model_data,
size_t input_model_data_size) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's some formatting issues with some of the newly added parameters in this file.

v_.ep_context_file_path = JSON::Get<std::string_view>(value);
} else if (name == "ep_context_embed_mode") {
v_.ep_context_embed_mode = JSON::Get<bool>(value);
} else if (name == "flags") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the flags option? The name doesn't provide much information.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class support in GenAI for compiling ONNX models into EPContext-enabled “context models” (and validating/reusing existing compiled artifacts) so subsequent loads can skip recompilation and graph optimization overhead.

Changes:

  • Introduces compile_options in genai_config.json parsing and plumbing to drive EPContext compilation behavior.
  • Adds ORT Compile API wrappers (GetCompileApi, OrtModelCompilationOptions, CompileModel) and integrates compilation/validation into model session creation flows.
  • Adds a new C example (model_compile) and build option to demonstrate compilation behavior and timing.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/models/model.h Adds compilation/validation APIs and state for tracking compiled pipeline model paths.
src/models/model.cpp Implements EPContext compile/validate/reuse flow and updates session creation to accept explicit model paths.
src/models/onnxruntime_api.h Adds GetCompileApi and public declarations for compile helpers/types.
src/models/onnxruntime_inline.h Implements inline wrappers for compile API and OrtModelCompilationOptions methods.
src/config.h Adds Config::CompileOptions and attaches it to relevant model config sections.
src/config.cpp Implements JSON parsing for compile_options across model components and pipeline entries.
src/models/decoder_only.cpp Uses compiled-or-original model path from Model::CompileModel when creating decoder session.
src/models/gpt.cpp Uses compiled-or-original model path from Model::CompileModel when creating decoder session.
src/models/whisper.cpp Uses compile path selection for encoder/decoder models.
src/models/marian.cpp Uses compile path selection for encoder/decoder models.
src/models/multi_modal.cpp Uses compile path selection for vision/speech/embedding/decoder sessions.
src/models/decoder_only_pipeline.cpp Attempts to use compiled pipeline model paths when creating pipeline sessions.
src/filesystem.h Extends fs::path (filename extraction, const operator/, recursive directory creation).
examples/c/src/common.cpp Registers plugin EPs via GenAI’s env so EP device enumeration/validation can see them.
examples/c/src/model_compile.cpp New example showcasing EPContext compile vs reuse behavior and timing.
examples/c/CMakeLists.txt Adds MODEL_COMPILE option and target.

: Model{std::move(config)} {
encoder_session_options_ = OrtSessionOptions::Create();
CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, true, false);
CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, false, false);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in whisper.cpp: passing is_primary_session_options=false into CreateSessionOptionsFromConfig can lead to provider duplication (providers list is already derived from provider_options) and also bypasses the multi-provider device mismatch check because SetProviderSessionOptions won’t return a device for non-primary sessions. Suggest reverting this argument to true for encoder session options (or adjust SetProviderSessionOptions to avoid duplicates and still validate device consistency).

Suggested change
CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, false, false);
CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, true, false);

Copilot uses AI. Check for mistakes.
speech_session_options_ = OrtSessionOptions::Create();
CreateSessionOptionsFromConfig(config_->model.speech.session_options.has_value() ? config_->model.speech.session_options.value() : config_->model.decoder.session_options, *speech_session_options_, true, true);
speech_session_ = CreateSession(ort_env, config_->model.speech.filename, speech_session_options_.get());
CreateSessionOptionsFromConfig(config_->model.speech.session_options.has_value() ? config_->model.speech.session_options.value() : config_->model.decoder.session_options, *speech_session_options_, false, true);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern for the speech session options: is_primary_session_options=false can cause duplicate EP appends and prevents the non-decoder session from contributing to the “single provider/device” validation. Recommend keeping this as true (or dedupe + return a device type even for non-primary sessions).

Suggested change
CreateSessionOptionsFromConfig(config_->model.speech.session_options.has_value() ? config_->model.speech.session_options.value() : config_->model.decoder.session_options, *speech_session_options_, false, true);
CreateSessionOptionsFromConfig(config_->model.speech.session_options.has_value() ? config_->model.speech.session_options.value() : config_->model.decoder.session_options, *speech_session_options_, true, true);

Copilot uses AI. Check for mistakes.

embedding_session_options_ = OrtSessionOptions::Create();
CreateSessionOptionsFromConfig(config_->model.embedding.session_options.has_value() ? config_->model.embedding.session_options.value() : config_->model.decoder.session_options, *embedding_session_options_, true, true);
CreateSessionOptionsFromConfig(config_->model.embedding.session_options.has_value() ? config_->model.embedding.session_options.value() : config_->model.decoder.session_options, *embedding_session_options_, false, true);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue for the embedding session options: passing is_primary_session_options=false can duplicate providers (providers list is already populated from provider_options) and can bypass the multi-provider guard because the device type isn’t returned for non-primary sessions. Consider using true here as well, or adjust SetProviderSessionOptions accordingly.

Suggested change
CreateSessionOptionsFromConfig(config_->model.embedding.session_options.has_value() ? config_->model.embedding.session_options.value() : config_->model.decoder.session_options, *embedding_session_options_, false, true);
CreateSessionOptionsFromConfig(config_->model.embedding.session_options.has_value() ? config_->model.embedding.session_options.value() : config_->model.decoder.session_options, *embedding_session_options_, true, true);

Copilot uses AI. Check for mistakes.
// Set external initializers file
if (comp_opts.external_initializers_file_path.has_value() &&
comp_opts.external_initializers_size_threshold.has_value()) {
fs::path external_init_path = config_->config_path / comp_opts.external_initializers_file_path.value();
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to ep_context_file_path: external_initializers_file_path is always joined with config_->config_path. If callers provide an absolute path, this will generate an incorrect path and likely fail compilation. Consider respecting absolute paths (only prefix when the config value is relative).

Suggested change
fs::path external_init_path = config_->config_path / comp_opts.external_initializers_file_path.value();
fs::path external_init_path = comp_opts.external_initializers_file_path.value();
if (!external_init_path.is_absolute()) {
external_init_path = config_->config_path / external_init_path;
}

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +210
inline bool create_directories(const path& p) {
#ifdef _WIN32
// On Windows, create directory recursively using CreateDirectoryW
if (p.exists()) {
return true; // Already exists
}

// First create parent directory if needed
path parent = p.parent_path();
if (!parent.string().empty() && !parent.exists()) {
if (!create_directories(parent)) {
return false;
}
}

// Create the directory
return CreateDirectoryW(p.c_str(), nullptr) != 0 || GetLastError() == ERROR_ALREADY_EXISTS;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_directories() returns true when p.exists() is true, but it doesn’t verify that the existing path is actually a directory. If the path exists as a file, this will incorrectly report success and downstream code will fail with a less actionable error. Consider checking p.is_directory() (or equivalent) and returning false/throwing when the existing path is not a directory.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +21
// Get the compiled model path if it was compiled, otherwise use full path from config + filename
std::string model_path = GetPipelineCompiledModelPath(model.model_id);
if (model_path.empty()) {
// Use full path to original model if not compiled
model_path = (config_->config_path / fs::path(model.filename)).string();
}
sessions_.emplace_back(CreateSession(ort_env, model_path, GetSessionOptions(model.model_id)));
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetPipelineCompiledModelPath() is only populated by Model::CompileModel(), but DecoderOnlyPipelineModel never calls CompileModel for its pipeline stages (it only creates sessions directly). As a result, this will always fall back to the original model files and any decoder.pipeline[*].compile_options in config won’t take effect for decoder-pipeline models. Consider compiling each pipeline stage here (or calling a shared compile step before session creation) when compile_options are enabled.

Copilot uses AI. Check for mistakes.
: Model{std::move(config)} {
encoder_session_options_ = OrtSessionOptions::Create();
CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, true, false);
CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, false, false);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third parameter to CreateSessionOptionsFromConfig was changed to false here. For encoder/decoder configs, Config::SessionOptions.providers is already populated from provider_options (see config.cpp), and SetProviderSessionOptions appends provider_options again when is_primary_session_options is false—so this can duplicate providers and append the same EP multiple times. It also prevents the encoder session from participating in the device consistency check (session_device stays null), potentially allowing mixed providers without throwing. Consider keeping is_primary_session_options as true for these non-pipeline sessions (or deduplicate providers + still return a device type for validation).

Suggested change
CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, false, false);
CreateSessionOptionsFromConfig(config_->model.encoder.session_options.has_value() ? config_->model.encoder.session_options.value() : config_->model.decoder.session_options, *encoder_session_options_, true, false);

Copilot uses AI. Check for mistakes.
Comment on lines 67 to +69
vision_session_options_ = OrtSessionOptions::Create();
CreateSessionOptionsFromConfig(config_->model.vision.session_options.has_value() ? config_->model.vision.session_options.value() : config_->model.decoder.session_options, *vision_session_options_, true, true);
vision_session_ = CreateSession(ort_env, config_->model.vision.filename, vision_session_options_.get());
CreateSessionOptionsFromConfig(config_->model.vision.session_options.has_value() ? config_->model.vision.session_options.value() : config_->model.decoder.session_options, *vision_session_options_, false, true);
std::string vision_model_path = CompileModel(ort_env, config_->model.vision.filename, vision_session_options_.get(), false, config_->model.vision.compile_options);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using is_primary_session_options=false here risks appending the same execution provider more than once (because Config::SessionOptions.providers is already populated from provider_options) and can skip the provider/device consistency check for the vision session. Consider using is_primary_session_options=true (or deduplicating providers inside SetProviderSessionOptions and still returning a device type for validation).

Copilot uses AI. Check for mistakes.
fs::path& out_compiled_model_path) {
if (compile_options_config.ep_context_file_path.has_value() && !compile_options_config.ep_context_file_path.value().empty()) {
// Single path: full path (relative to config path) including filename, e.g. "contexts/model_ctx.onnx"
out_compiled_model_path = config_->config_path / compile_options_config.ep_context_file_path.value();
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ep_context_file_path is documented as a “full path (relative to config path)”, but the code always prefixes it with config_->config_path. If a user supplies an absolute path, config_path / absolute_path will produce an invalid joined path with this project’s fs::path implementation. Consider handling absolute paths explicitly (use the provided path as-is when it’s not relative).

Suggested change
out_compiled_model_path = config_->config_path / compile_options_config.ep_context_file_path.value();
fs::path ep_context_path(compile_options_config.ep_context_file_path.value());
if (ep_context_path.is_absolute()) {
// Use absolute paths as provided.
out_compiled_model_path = ep_context_path;
} else {
// Resolve relative paths against the config path.
out_compiled_model_path = config_->config_path / ep_context_path;
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants