Summary
Implement accurate token counting using actual model tokenizers instead of the crude character_count / 4 heuristic.
Background
Gemini CLI Pattern (packages/core/src/code_assist/converter.ts):
- Uses official Google AI token counting API
- Provides
countTokens function
- Returns exact token count for model being used
- Factors in model-specific tokenization
Current Problem: We estimate 1 token ≈ 4 characters which can be off by 50%+ for:
- Code (more tokens per character)
- Non-English text (UTF-8 multibyte characters)
- Special characters and symbols
Implementation Strategy
1. Token Counting API Integration
File: src/TokenCounter.ps1
```powershell
class TokenCounter {
[string]$ApiKey
[string]$Model
[hashtable]$Cache = @{} # Content hash -> token count
[int] CountTokens([string]$Text) {
# Check cache first
\$hash = Get-ContentHash \$Text
if (\$this.Cache.ContainsKey(\$hash)) {
return \$this.Cache[\$hash]
}
# Call Google AI token counting API
\$body = @{
model = \$this.Model
contents = @(@{
parts = @(@{
text = \$Text
})
})
} | ConvertTo-Json -Depth 10
\$response = Invoke-RestMethod `
-Uri "https://generativelanguage.googleapis.com/v1beta/models/\$(\$this.Model):countTokens?key=\$(\$this.ApiKey)" `
-Method POST `
-ContentType "application/json" `
-Body \$body
\$tokenCount = \$response.totalTokens
# Cache result
\$this.Cache[\$hash] = \$tokenCount
return \$tokenCount
}
[int] EstimateTokens([string]$Text) {
# Fast estimation (fallback if API unavailable)
# Improved heuristic based on content type
if (\$Text -match '^\s*(\{|\[)') {
# JSON - more compact
return [Math]::Ceiling(\$Text.Length / 3.5)
} elseif (\$Text -match '^(function|class|def|public|private)') {
# Code - less compact
return [Math]::Ceiling(\$Text.Length / 3)
} else {
# Plain text
return [Math]::Ceiling(\$Text.Length / 4)
}
}
}
$script:TokenCounter = [TokenCounter]@{
ApiKey = $env:GEMINI_API_KEY
Model = "gemini-2.5-flash"
}
function Get-TokenCount {
param(
[string]$Text,
[switch]$UseApi,
[switch]$Estimate
)
if (\$Estimate -or -not \$env:GEMINI_API_KEY) {
return \$script:TokenCounter.EstimateTokens(\$Text)
}
try {
return \$script:TokenCounter.CountTokens(\$Text)
} catch {
Write-Log "Token counting API failed, falling back to estimation: \$(\$_.Exception.Message)" "WARN"
return \$script:TokenCounter.EstimateTokens(\$Text)
}
}
```
2. Batch Token Counting
File: src/BatchTokenCounter.ps1
```powershell
function Get-TokenCountBatch {
param(
[string[]]$Texts,
[int]$BatchSize = 10
)
\$results = @()
for (\$i = 0; \$i -lt \$Texts.Count; \$i += \$BatchSize) {
\$batch = \$Texts[\$i..[Math]::Min(\$i + \$BatchSize - 1, \$Texts.Count - 1)]
# API supports batch counting
\$body = @{
model = \$script:TokenCounter.Model
contents = \$batch | ForEach-Object {
@{
parts = @(@{
text = \$_
})
}
}
} | ConvertTo-Json -Depth 10
\$response = Invoke-RestMethod `
-Uri "https://generativelanguage.googleapis.com/v1beta/models/\$(\$script:TokenCounter.Model):batchCountTokens?key=\$(\$script:TokenCounter.ApiKey)" `
-Method POST `
-ContentType "application/json" `
-Body \$body
\$results += \$response.counts
}
return \$results
}
```
3. Usage Metadata Tracking
Following Gemini CLI's usageMetadata pattern:
```powershell
class TokenUsage {
[int]$PromptTokens
[int]$CompletionTokens
[int]$TotalTokens
[int]$CachedTokens
[DateTime]$Timestamp
}
function Track-TokenUsage {
param(
[int]$PromptTokens,
[int]$CompletionTokens,
[int]$CachedTokens = 0
)
\$usage = [TokenUsage]@{
PromptTokens = \$PromptTokens
CompletionTokens = \$CompletionTokens
TotalTokens = \$PromptTokens + \$CompletionTokens
CachedTokens = \$CachedTokens
Timestamp = Get-Date
}
# Append to session log
\$usage | Export-Csv "\$DATA_DIR/token-usage.csv" -Append -NoTypeInformation
return \$usage
}
```
Acceptance Criteria
Testing Strategy
- Accuracy Test: Compare API vs actual model token usage
- Cache Test: Second count of same text uses cache
- Batch Test: Batch counting returns correct counts
- Fallback Test: Estimation used when API unavailable
- Performance Test: < 500ms for single count, < 2s for batch of 100
Security Considerations
- API key stored securely in environment variable
- No API key logging
- Rate limiting to avoid quota exhaustion
- Graceful degradation if quota exceeded
Priority
High - Accurate counting is foundation for optimization
References
Gemini CLI implementation:
Summary
Implement accurate token counting using actual model tokenizers instead of the crude
character_count / 4heuristic.Background
Gemini CLI Pattern (
packages/core/src/code_assist/converter.ts):countTokensfunctionCurrent Problem: We estimate
1 token ≈ 4 characterswhich can be off by 50%+ for:Implementation Strategy
1. Token Counting API Integration
File:
src/TokenCounter.ps1```powershell
class TokenCounter {
[string]$ApiKey
[string]$Model
[hashtable]$Cache = @{} # Content hash -> token count
}
$script:TokenCounter = [TokenCounter]@{
ApiKey = $env:GEMINI_API_KEY
Model = "gemini-2.5-flash"
}
function Get-TokenCount {
param(
[string]$Text,
[switch]$UseApi,
[switch]$Estimate
)
}
```
2. Batch Token Counting
File:
src/BatchTokenCounter.ps1```powershell
function Get-TokenCountBatch {
param(
[string[]]$Texts,
[int]$BatchSize = 10
)
}
```
3. Usage Metadata Tracking
Following Gemini CLI's
usageMetadatapattern:```powershell
class TokenUsage {
[int]$PromptTokens
[int]$CompletionTokens
[int]$TotalTokens
[int]$CachedTokens
[DateTime]$Timestamp
}
function Track-TokenUsage {
param(
[int]$PromptTokens,
[int]$CompletionTokens,
[int]$CachedTokens = 0
)
}
```
Acceptance Criteria
Testing Strategy
Security Considerations
Priority
High - Accurate counting is foundation for optimization
References
Gemini CLI implementation:
packages/core/src/code_assist/converter.ts(token counting API)packages/core/src/core/geminiChat.ts(usageMetadataextraction)