Skip to content

feat: implement sophisticated token counting (beyond character/4) #123

@ooples

Description

@ooples

Summary

Implement accurate token counting using actual model tokenizers instead of the crude character_count / 4 heuristic.

Background

Gemini CLI Pattern (packages/core/src/code_assist/converter.ts):

  • Uses official Google AI token counting API
  • Provides countTokens function
  • Returns exact token count for model being used
  • Factors in model-specific tokenization

Current Problem: We estimate 1 token ≈ 4 characters which can be off by 50%+ for:

  • Code (more tokens per character)
  • Non-English text (UTF-8 multibyte characters)
  • Special characters and symbols

Implementation Strategy

1. Token Counting API Integration

File: src/TokenCounter.ps1

```powershell
class TokenCounter {
[string]$ApiKey
[string]$Model
[hashtable]$Cache = @{} # Content hash -> token count

[int] CountTokens([string]$Text) {
    # Check cache first
    \$hash = Get-ContentHash \$Text
    if (\$this.Cache.ContainsKey(\$hash)) {
        return \$this.Cache[\$hash]
    }
    
    # Call Google AI token counting API
    \$body = @{
        model = \$this.Model
        contents = @(@{
            parts = @(@{
                text = \$Text
            })
        })
    } | ConvertTo-Json -Depth 10
    
    \$response = Invoke-RestMethod `
        -Uri "https://generativelanguage.googleapis.com/v1beta/models/\$(\$this.Model):countTokens?key=\$(\$this.ApiKey)" `
        -Method POST `
        -ContentType "application/json" `
        -Body \$body
    
    \$tokenCount = \$response.totalTokens
    
    # Cache result
    \$this.Cache[\$hash] = \$tokenCount
    
    return \$tokenCount
}

[int] EstimateTokens([string]$Text) {
    # Fast estimation (fallback if API unavailable)
    # Improved heuristic based on content type
    if (\$Text -match '^\s*(\{|\[)') {
        # JSON - more compact
        return [Math]::Ceiling(\$Text.Length / 3.5)
    } elseif (\$Text -match '^(function|class|def|public|private)') {
        # Code - less compact
        return [Math]::Ceiling(\$Text.Length / 3)
    } else {
        # Plain text
        return [Math]::Ceiling(\$Text.Length / 4)
    }
}

}

$script:TokenCounter = [TokenCounter]@{
ApiKey = $env:GEMINI_API_KEY
Model = "gemini-2.5-flash"
}

function Get-TokenCount {
param(
[string]$Text,
[switch]$UseApi,
[switch]$Estimate
)

if (\$Estimate -or -not \$env:GEMINI_API_KEY) {
    return \$script:TokenCounter.EstimateTokens(\$Text)
}

try {
    return \$script:TokenCounter.CountTokens(\$Text)
} catch {
    Write-Log "Token counting API failed, falling back to estimation: \$(\$_.Exception.Message)" "WARN"
    return \$script:TokenCounter.EstimateTokens(\$Text)
}

}
```

2. Batch Token Counting

File: src/BatchTokenCounter.ps1

```powershell
function Get-TokenCountBatch {
param(
[string[]]$Texts,
[int]$BatchSize = 10
)

\$results = @()

for (\$i = 0; \$i -lt \$Texts.Count; \$i += \$BatchSize) {
    \$batch = \$Texts[\$i..[Math]::Min(\$i + \$BatchSize - 1, \$Texts.Count - 1)]
    
    # API supports batch counting
    \$body = @{
        model = \$script:TokenCounter.Model
        contents = \$batch | ForEach-Object {
            @{
                parts = @(@{
                    text = \$_
                })
            }
        }
    } | ConvertTo-Json -Depth 10
    
    \$response = Invoke-RestMethod `
        -Uri "https://generativelanguage.googleapis.com/v1beta/models/\$(\$script:TokenCounter.Model):batchCountTokens?key=\$(\$script:TokenCounter.ApiKey)" `
        -Method POST `
        -ContentType "application/json" `
        -Body \$body
    
    \$results += \$response.counts
}

return \$results

}
```

3. Usage Metadata Tracking

Following Gemini CLI's usageMetadata pattern:

```powershell
class TokenUsage {
[int]$PromptTokens
[int]$CompletionTokens
[int]$TotalTokens
[int]$CachedTokens
[DateTime]$Timestamp
}

function Track-TokenUsage {
param(
[int]$PromptTokens,
[int]$CompletionTokens,
[int]$CachedTokens = 0
)

\$usage = [TokenUsage]@{
    PromptTokens = \$PromptTokens
    CompletionTokens = \$CompletionTokens  
    TotalTokens = \$PromptTokens + \$CompletionTokens
    CachedTokens = \$CachedTokens
    Timestamp = Get-Date
}

# Append to session log
\$usage | Export-Csv "\$DATA_DIR/token-usage.csv" -Append -NoTypeInformation

return \$usage

}
```

Acceptance Criteria

  • TokenCounter class with API integration
  • Caching to avoid redundant API calls
  • Batch counting for multiple texts
  • Improved estimation heuristics (code vs text vs JSON)
  • Fallback to estimation if API unavailable
  • Usage metadata tracking (prompt/completion/cached tokens)
  • Token count within 5% of actual for common content
  • API calls complete in < 500ms
  • Cache hit rate > 80% for repeated content

Testing Strategy

  1. Accuracy Test: Compare API vs actual model token usage
  2. Cache Test: Second count of same text uses cache
  3. Batch Test: Batch counting returns correct counts
  4. Fallback Test: Estimation used when API unavailable
  5. Performance Test: < 500ms for single count, < 2s for batch of 100

Security Considerations

  • API key stored securely in environment variable
  • No API key logging
  • Rate limiting to avoid quota exhaustion
  • Graceful degradation if quota exceeded

Priority

High - Accurate counting is foundation for optimization

References

Gemini CLI implementation:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions