FlagOpen/FlagEmbedding
Affected component
FlagEmbedding.inference.FlagAutoModel.from_finetuned
Affected versions
Versions containing FlagAutoModel.from_finetuned() with basename-only lookup against AUTO_EMBEDDER_MAPPING and registry-derived trust_remote_code defaults are affected. This has been confirmed in the current public master branch as of 2026-05-07.
A precise introduced version should be confirmed by the maintainers from release history. The vulnerable pattern is present when all of the following are true:
model_name_or_path is reduced to os.path.basename(model_name_or_path);
- the resulting basename is looked up in
AUTO_EMBEDDER_MAPPING;
trust_remote_code is None is replaced with model_config.trust_remote_code;
- the resolved value is passed to Hugging Face
AutoTokenizer.from_pretrained() and/or AutoModel.from_pretrained().
Summary
FlagAutoModel.from_finetuned(model_name_or_path) silently enables Hugging Face trust_remote_code=True for certain model names based only on the basename of the user-supplied path or Hub repository identifier.
The dispatcher extracts the basename of model_name_or_path, looks it up in AUTO_EMBEDDER_MAPPING, and, if the caller did not explicitly pass trust_remote_code, replaces the caller’s implicit default with the registry value. Several registry entries set trust_remote_code=True. As a result, any local directory or Hugging Face Hub repository whose basename collides with one of these trusted registry names can cause remote/custom code execution even though the user did not explicitly opt in to trust_remote_code=True.
This defeats the expected trust boundary of Hugging Face trust_remote_code: the decision to execute model-provided Python code is not based on a verified repository identity, but only on the final path component.
Technical details
The vulnerable dispatcher logic is conceptually:
model_name = os.path.basename(model_name_or_path)
if model_name.startswith("checkpoint-"):
model_name = os.path.basename(os.path.dirname(model_name_or_path))
model_config = AUTO_EMBEDDER_MAPPING[model_name]
if trust_remote_code is None:
trust_remote_code = model_config.trust_remote_code
The resolved trust_remote_code value is then passed into the selected embedder class and eventually into Hugging Face loading APIs such as:
AutoTokenizer.from_pretrained(
model_name_or_path,
trust_remote_code=trust_remote_code,
...
)
AutoModel.from_pretrained(
model_name_or_path,
trust_remote_code=trust_remote_code,
...
)
Hugging Face documents that trust_remote_code=True allows custom model code from the repository to execute on the local machine. Therefore, setting this value based only on a basename collision is unsafe.
Confirmed basename-collision entries
The following basenames currently resolve to trust_remote_code=True through the registry:
bge-code-v1
gte-Qwen2-7B-instruct
gte-Qwen2-1.5B-instruct
gte-Qwen1.5-7B-instruct
gte-multilingual-base
gte-large-en-v1.5
gte-base-en-v1.5
Note: gte-base-en-v1.5 is affected through a positional True argument in EmbedderConfig(FlagModel, PoolingMethod.CLS, True).
Impact
An attacker can cause arbitrary Python code execution in the victim’s Python process if the victim loads an attacker-controlled model repository or local directory whose basename matches one of the affected registry entries.
Realistic attack scenarios include:
-
Hugging Face Hub namespace collision / social engineering
An attacker publishes a repository such as:
The victim is convinced to load it:
from FlagEmbedding import FlagAutoModel
model = FlagAutoModel.from_finetuned("attacker-org/bge-code-v1")
Because the basename is bge-code-v1, the dispatcher matches the trusted registry entry and silently sets trust_remote_code=True.
-
Local directory collision
An attacker places a malicious model directory at a path such as:
/shared/models/bge-code-v1
The victim loads it:
model = FlagAutoModel.from_finetuned("/shared/models/bge-code-v1")
The basename again matches the registry entry and silently enables trust_remote_code=True.
-
Internal model hub / mirror
In organizations using private Hugging Face mirrors or shared model registries, a user with write access can publish a colliding model name such as:
internal-team/bge-code-v1
Other users who load that model through FlagEmbedding may execute untrusted custom code without realizing that TRC was enabled by the registry.
Why this is a vulnerability
The intended behavior appears to be: certain known upstream model repositories require custom code, so the library stores trust_remote_code=True for them.
However, the implementation does not verify the full repository identity, such as host, organization, repository name, revision, or content hash. It verifies only the basename.
Therefore, the security-sensitive decision to execute repository-provided Python code can be triggered by any path or repo ID ending in the same basename. This is an authenticity verification failure chained into arbitrary code execution.
The issue is especially risky because the public API documents trust_remote_code as optional and defaulting to None, and callers who do not explicitly pass trust_remote_code=True have no clear warning that the dispatcher may enable it on their behalf.
Proof of concept
A non-destructive PoC should demonstrate the following:
-
Create or reference a model path whose basename is one of the affected names, for example:
/tmp/attacker/bge-code-v1
-
Call:
FlagAutoModel.from_finetuned("/tmp/attacker/bge-code-v1")
without passing trust_remote_code.
-
Observe that the dispatcher resolves trust_remote_code=True.
-
Optionally, include an end-to-end local model directory with a custom auto_map target that writes a marker file during import, demonstrating that code execution occurs only because the registry silently enabled TRC.
The attached PoC demonstrates the basename-collision dispatcher behavior and verifies the vulnerable source lines without network access. A private end-to-end RCE marker PoC can be provided to maintainers on request.
Expected behavior
trust_remote_code=True should only be enabled when the caller explicitly opts in, or when the library has verified the full trusted repository identity.
Examples of acceptable behavior:
FlagAutoModel.from_finetuned(
"BAAI/bge-code-v1",
trust_remote_code=True,
)
or an internal allowlist that verifies a full identity tuple such as:
(host, organization, repository, revision)
Actual behavior
trust_remote_code=True is enabled when the basename of model_name_or_path matches a registry entry, even if the full repository or local path is attacker-controlled.
For example:
FlagAutoModel.from_finetuned("attacker-org/bge-code-v1")
can silently resolve to:
because the basename is bge-code-v1.
Recommended fix
The safest fix is to remove trust_remote_code from registry defaults and require explicit user opt-in.
Recommended options, from strongest to weakest:
-
Require explicit opt-in
Do not set trust_remote_code=True from AUTO_EMBEDDER_MAPPING. If a model requires custom code, raise an error explaining that the user must pass trust_remote_code=True explicitly after reviewing the model code.
-
Verify full repository identity
If automatic TRC is retained, verify the full identity of the model, not just the basename. For Hub models, this should include at least:
host
organization / namespace
repository name
revision or commit hash
-
Never enable TRC for arbitrary local paths based only on basename
Local paths such as /tmp/bge-code-v1 or /shared/models/bge-code-v1 should not inherit trust from an upstream Hub model registry entry.
-
Emit a warning or require confirmation
If the registry changes trust_remote_code from None to True, emit a clear warning such as:
WARNING: trust_remote_code=True was enabled automatically because basename
'bge-code-v1' matched a registry entry. This may execute custom Python code
from the supplied model path. Pass trust_remote_code=False to disable this.
-
Add regression tests
Add tests ensuring that paths such as:
/tmp/bge-code-v1
attacker-org/bge-code-v1
do not automatically enable trust_remote_code=True.
Suggested patch direction
A minimal conservative change would be:
if trust_remote_code is None:
trust_remote_code = False
Then require users to pass trust_remote_code=True explicitly for models that need custom code.
A stricter allowlist-based change would distinguish between exact trusted upstream identifiers and arbitrary basename collisions:
TRUSTED_REMOTE_CODE_MODELS = {
("huggingface.co", "BAAI", "bge-code-v1"),
("huggingface.co", "Alibaba-NLP", "gte-Qwen2-7B-instruct"),
...
}
However, even this should preferably require a pinned revision to avoid future supply-chain ambiguity.
Workarounds
Until a fix is available, users should explicitly pass:
when loading any model whose provenance has not been manually reviewed.
Users should avoid loading untrusted model repositories or local directories with basenames matching the affected registry entries.
FlagOpen/FlagEmbedding
Affected component
FlagEmbedding.inference.FlagAutoModel.from_finetunedAffected versions
Versions containing
FlagAutoModel.from_finetuned()with basename-only lookup againstAUTO_EMBEDDER_MAPPINGand registry-derivedtrust_remote_codedefaults are affected. This has been confirmed in the current publicmasterbranch as of 2026-05-07.A precise introduced version should be confirmed by the maintainers from release history. The vulnerable pattern is present when all of the following are true:
model_name_or_pathis reduced toos.path.basename(model_name_or_path);AUTO_EMBEDDER_MAPPING;trust_remote_code is Noneis replaced withmodel_config.trust_remote_code;AutoTokenizer.from_pretrained()and/orAutoModel.from_pretrained().Summary
FlagAutoModel.from_finetuned(model_name_or_path)silently enables Hugging Facetrust_remote_code=Truefor certain model names based only on the basename of the user-supplied path or Hub repository identifier.The dispatcher extracts the basename of
model_name_or_path, looks it up inAUTO_EMBEDDER_MAPPING, and, if the caller did not explicitly passtrust_remote_code, replaces the caller’s implicit default with the registry value. Several registry entries settrust_remote_code=True. As a result, any local directory or Hugging Face Hub repository whose basename collides with one of these trusted registry names can cause remote/custom code execution even though the user did not explicitly opt in totrust_remote_code=True.This defeats the expected trust boundary of Hugging Face
trust_remote_code: the decision to execute model-provided Python code is not based on a verified repository identity, but only on the final path component.Technical details
The vulnerable dispatcher logic is conceptually:
The resolved
trust_remote_codevalue is then passed into the selected embedder class and eventually into Hugging Face loading APIs such as:Hugging Face documents that
trust_remote_code=Trueallows custom model code from the repository to execute on the local machine. Therefore, setting this value based only on a basename collision is unsafe.Confirmed basename-collision entries
The following basenames currently resolve to
trust_remote_code=Truethrough the registry:Note:
gte-base-en-v1.5is affected through a positionalTrueargument inEmbedderConfig(FlagModel, PoolingMethod.CLS, True).Impact
An attacker can cause arbitrary Python code execution in the victim’s Python process if the victim loads an attacker-controlled model repository or local directory whose basename matches one of the affected registry entries.
Realistic attack scenarios include:
Hugging Face Hub namespace collision / social engineering
An attacker publishes a repository such as:
The victim is convinced to load it:
Because the basename is
bge-code-v1, the dispatcher matches the trusted registry entry and silently setstrust_remote_code=True.Local directory collision
An attacker places a malicious model directory at a path such as:
The victim loads it:
The basename again matches the registry entry and silently enables
trust_remote_code=True.Internal model hub / mirror
In organizations using private Hugging Face mirrors or shared model registries, a user with write access can publish a colliding model name such as:
Other users who load that model through FlagEmbedding may execute untrusted custom code without realizing that TRC was enabled by the registry.
Why this is a vulnerability
The intended behavior appears to be: certain known upstream model repositories require custom code, so the library stores
trust_remote_code=Truefor them.However, the implementation does not verify the full repository identity, such as host, organization, repository name, revision, or content hash. It verifies only the basename.
Therefore, the security-sensitive decision to execute repository-provided Python code can be triggered by any path or repo ID ending in the same basename. This is an authenticity verification failure chained into arbitrary code execution.
The issue is especially risky because the public API documents
trust_remote_codeas optional and defaulting toNone, and callers who do not explicitly passtrust_remote_code=Truehave no clear warning that the dispatcher may enable it on their behalf.Proof of concept
A non-destructive PoC should demonstrate the following:
Create or reference a model path whose basename is one of the affected names, for example:
Call:
without passing
trust_remote_code.Observe that the dispatcher resolves
trust_remote_code=True.Optionally, include an end-to-end local model directory with a custom
auto_maptarget that writes a marker file during import, demonstrating that code execution occurs only because the registry silently enabled TRC.The attached PoC demonstrates the basename-collision dispatcher behavior and verifies the vulnerable source lines without network access. A private end-to-end RCE marker PoC can be provided to maintainers on request.
Expected behavior
trust_remote_code=Trueshould only be enabled when the caller explicitly opts in, or when the library has verified the full trusted repository identity.Examples of acceptable behavior:
or an internal allowlist that verifies a full identity tuple such as:
Actual behavior
trust_remote_code=Trueis enabled when the basename ofmodel_name_or_pathmatches a registry entry, even if the full repository or local path is attacker-controlled.For example:
can silently resolve to:
because the basename is
bge-code-v1.Recommended fix
The safest fix is to remove
trust_remote_codefrom registry defaults and require explicit user opt-in.Recommended options, from strongest to weakest:
Require explicit opt-in
Do not set
trust_remote_code=TruefromAUTO_EMBEDDER_MAPPING. If a model requires custom code, raise an error explaining that the user must passtrust_remote_code=Trueexplicitly after reviewing the model code.Verify full repository identity
If automatic TRC is retained, verify the full identity of the model, not just the basename. For Hub models, this should include at least:
Never enable TRC for arbitrary local paths based only on basename
Local paths such as
/tmp/bge-code-v1or/shared/models/bge-code-v1should not inherit trust from an upstream Hub model registry entry.Emit a warning or require confirmation
If the registry changes
trust_remote_codefromNonetoTrue, emit a clear warning such as:Add regression tests
Add tests ensuring that paths such as:
do not automatically enable
trust_remote_code=True.Suggested patch direction
A minimal conservative change would be:
Then require users to pass
trust_remote_code=Trueexplicitly for models that need custom code.A stricter allowlist-based change would distinguish between exact trusted upstream identifiers and arbitrary basename collisions:
However, even this should preferably require a pinned revision to avoid future supply-chain ambiguity.
Workarounds
Until a fix is available, users should explicitly pass:
when loading any model whose provenance has not been manually reviewed.
Users should avoid loading untrusted model repositories or local directories with basenames matching the affected registry entries.