Respect TIKTOKEN_OFFLINE env var to prevent network access in offline mode#514
Open
Spectual wants to merge 1 commit intoopenai:mainfrom
Open
Respect TIKTOKEN_OFFLINE env var to prevent network access in offline mode#514Spectual wants to merge 1 commit intoopenai:mainfrom
Spectual wants to merge 1 commit intoopenai:mainfrom
Conversation
When TIKTOKEN_OFFLINE is set, network fetches should never be attempted. Previously, read_file() would ignore this variable and attempt HTTP/blob requests regardless, causing failures in air-gapped environments. Fix read_file() to raise a clear ValueError when TIKTOKEN_OFFLINE is set and the path requires a network fetch. Also guard the cache-disabled path in read_file_cached() (TIKTOKEN_CACHE_DIR="") so that it too raises instead of falling through to a network request. Fixes openai#513
|
While your effort is great, I would not say this "fixes" anything. You are just changing the error message. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #513
In tiktoken 0.9.0+, the
TIKTOKEN_OFFLINEenvironment variable is silently ignored. When the local cache is empty (e.g. fresh install in an air-gapped environment),read_file()proceeds to make HTTP requests even whenTIKTOKEN_OFFLINE=1is set, causing unexpected network errors instead of a clear actionable message.The
TIKTOKEN_CACHE_DIRvariable was also not helping in this scenario because even when set correctly, if the cache file was absent the code would fall through to a network fetch.Fix
read_file(): checkTIKTOKEN_OFFLINEbefore attempting any network fetch (http://,https://, or blobfile). If set, raise aValueErrorwith a clear message explaining the situation.read_file_cached(): guard theTIKTOKEN_CACHE_DIR=""(caching-disabled) path as well, raising early whenTIKTOKEN_OFFLINEis set rather than delegating toread_file()after bypassing the cache lookup.Local file paths (
"://" not in blobpath) are unaffected and continue to work normally in offline mode.Behaviour after fix
TIKTOKEN_OFFLINE=1, cache populatedTIKTOKEN_OFFLINE=1, cache emptyValueErrorTIKTOKEN_OFFLINE=1,TIKTOKEN_CACHE_DIR=""ValueErrorTIKTOKEN_OFFLINE, any pathTest plan
TIKTOKEN_OFFLINE=1with a pre-populated cache still loads the tokenizer successfullyTIKTOKEN_OFFLINE=1with an empty cache raisesValueErrorwith a clear message (no network attempt)