Problem
The current token counting mechanism is a mix of a simple estimation (length / 4) and a call to the Google AI countTokens API. This is not a robust solution for a production environment for several reasons:
- Inaccurate Fallback: The length / 4 estimation is highly inaccurate for different types of content (code, JSON, etc.) and different languages.
- Vendor Lock-in: The countTokens API is specific to Google AI models. The system should be able to handle models from other vendors (e.g., OpenAI, Anthropic) that use different tokenizers.
- No Local Tokenization: The system relies on a network call for accurate token counting, which introduces latency and a point of failure. For models where the tokenizer is available locally (like iktoken for OpenAI models), it should be used.
- No Model-Specific Tokenization: The current implementation does not account for different tokenization rules for different models from the same vendor (e.g., gpt-3.5-turbo vs. gpt-4).
Desired State
A production-ready token counting system should have the following characteristics:
- Pluggable Tokenizers: The system should support multiple tokenization strategies and allow for new ones to be added easily.
- Model-Specific Configuration: The system should be able to determine which tokenizer to use based on the model being used.
- Local First, Remote Fallback: For models with available local tokenizers ( iktoken), the system should use them first to avoid network latency. If a local tokenizer is not available, it should fall back to a remote API call if possible.
- Improved Estimation: The fallback estimation logic should be more sophisticated than a simple character ratio, taking into account content type and language.
- Caching: All token counting operations (local and remote) should be cached to avoid redundant computations.
Implementation Plan
Phase 1: Create a Pluggable Tokenizer Framework in TypeScript
-
Define ITokenizer Interface (src/core/tokenizers/ITokenizer.ts):
-
Create a TokenizerFactory (src/core/tokenizers/TokenizerFactory.ts):
-
Implement TiktokenTokenizer (src/core/tokenizers/TiktokenTokenizer.ts):
-
Implement GoogleAITokenizer (src/core/tokenizers/GoogleAITokenizer.ts):
-
Implement EstimationTokenizer (src/core/tokenizers/EstimationTokenizer.ts):
Phase 2: Refactor the TokenCounter to Use the New Framework
-
Update src/core/token-counter.ts:
-
Update the count_tokens MCP Tool:
Phase 3: Update the PowerShell Orchestrator
- Modify hooks/handlers/token-optimizer-orchestrator.ps1:
Problem
The current token counting mechanism is a mix of a simple estimation (length / 4) and a call to the Google AI countTokens API. This is not a robust solution for a production environment for several reasons:
Desired State
A production-ready token counting system should have the following characteristics:
Implementation Plan
Phase 1: Create a Pluggable Tokenizer Framework in TypeScript
Define ITokenizer Interface (src/core/tokenizers/ITokenizer.ts):
Create a TokenizerFactory (src/core/tokenizers/TokenizerFactory.ts):
Implement TiktokenTokenizer (src/core/tokenizers/TiktokenTokenizer.ts):
Implement GoogleAITokenizer (src/core/tokenizers/GoogleAITokenizer.ts):
Implement EstimationTokenizer (src/core/tokenizers/EstimationTokenizer.ts):
Phase 2: Refactor the TokenCounter to Use the New Framework
Update src/core/token-counter.ts:
Update the count_tokens MCP Tool:
Phase 3: Update the PowerShell Orchestrator