feat: implement sophisticated token counting (beyond character/4)

## Summary

Implement accurate token counting using actual model tokenizers instead of the crude `character_count / 4` heuristic.

## Background

**Gemini CLI Pattern** (`packages/core/src/code_assist/converter.ts`):
- Uses official Google AI token counting API
- Provides `countTokens` function
- Returns exact token count for model being used
- Factors in model-specific tokenization

**Current Problem**: We estimate `1 token ≈ 4 characters` which can be off by 50%+ for:
- Code (more tokens per character)
- Non-English text (UTF-8 multibyte characters)
- Special characters and symbols

## Implementation Strategy

### 1. Token Counting API Integration

**File**: `src/TokenCounter.ps1`

\`\`\`powershell
class TokenCounter {
    [string]$ApiKey
    [string]$Model
    [hashtable]$Cache = @{}  # Content hash -> token count
    
    [int] CountTokens([string]$Text) {
        # Check cache first
        \$hash = Get-ContentHash \$Text
        if (\$this.Cache.ContainsKey(\$hash)) {
            return \$this.Cache[\$hash]
        }
        
        # Call Google AI token counting API
        \$body = @{
            model = \$this.Model
            contents = @(@{
                parts = @(@{
                    text = \$Text
                })
            })
        } | ConvertTo-Json -Depth 10
        
        \$response = Invoke-RestMethod `
            -Uri "https://generativelanguage.googleapis.com/v1beta/models/\$(\$this.Model):countTokens?key=\$(\$this.ApiKey)" `
            -Method POST `
            -ContentType "application/json" `
            -Body \$body
        
        \$tokenCount = \$response.totalTokens
        
        # Cache result
        \$this.Cache[\$hash] = \$tokenCount
        
        return \$tokenCount
    }
    
    [int] EstimateTokens([string]$Text) {
        # Fast estimation (fallback if API unavailable)
        # Improved heuristic based on content type
        if (\$Text -match '^\s*(\{|\[)') {
            # JSON - more compact
            return [Math]::Ceiling(\$Text.Length / 3.5)
        } elseif (\$Text -match '^(function|class|def|public|private)') {
            # Code - less compact
            return [Math]::Ceiling(\$Text.Length / 3)
        } else {
            # Plain text
            return [Math]::Ceiling(\$Text.Length / 4)
        }
    }
}

\$script:TokenCounter = [TokenCounter]@{
    ApiKey = \$env:GEMINI_API_KEY
    Model = "gemini-2.5-flash"
}

function Get-TokenCount {
    param(
        [string]$Text,
        [switch]$UseApi,
        [switch]$Estimate
    )
    
    if (\$Estimate -or -not \$env:GEMINI_API_KEY) {
        return \$script:TokenCounter.EstimateTokens(\$Text)
    }
    
    try {
        return \$script:TokenCounter.CountTokens(\$Text)
    } catch {
        Write-Log "Token counting API failed, falling back to estimation: \$(\$_.Exception.Message)" "WARN"
        return \$script:TokenCounter.EstimateTokens(\$Text)
    }
}
\`\`\`

### 2. Batch Token Counting

**File**: `src/BatchTokenCounter.ps1`

\`\`\`powershell
function Get-TokenCountBatch {
    param(
        [string[]]$Texts,
        [int]$BatchSize = 10
    )
    
    \$results = @()
    
    for (\$i = 0; \$i -lt \$Texts.Count; \$i += \$BatchSize) {
        \$batch = \$Texts[\$i..[Math]::Min(\$i + \$BatchSize - 1, \$Texts.Count - 1)]
        
        # API supports batch counting
        \$body = @{
            model = \$script:TokenCounter.Model
            contents = \$batch | ForEach-Object {
                @{
                    parts = @(@{
                        text = \$_
                    })
                }
            }
        } | ConvertTo-Json -Depth 10
        
        \$response = Invoke-RestMethod `
            -Uri "https://generativelanguage.googleapis.com/v1beta/models/\$(\$script:TokenCounter.Model):batchCountTokens?key=\$(\$script:TokenCounter.ApiKey)" `
            -Method POST `
            -ContentType "application/json" `
            -Body \$body
        
        \$results += \$response.counts
    }
    
    return \$results
}
\`\`\`

### 3. Usage Metadata Tracking

Following Gemini CLI's `usageMetadata` pattern:

\`\`\`powershell
class TokenUsage {
    [int]$PromptTokens
    [int]$CompletionTokens
    [int]$TotalTokens
    [int]$CachedTokens
    [DateTime]$Timestamp
}

function Track-TokenUsage {
    param(
        [int]$PromptTokens,
        [int]$CompletionTokens,
        [int]$CachedTokens = 0
    )
    
    \$usage = [TokenUsage]@{
        PromptTokens = \$PromptTokens
        CompletionTokens = \$CompletionTokens  
        TotalTokens = \$PromptTokens + \$CompletionTokens
        CachedTokens = \$CachedTokens
        Timestamp = Get-Date
    }
    
    # Append to session log
    \$usage | Export-Csv "\$DATA_DIR/token-usage.csv" -Append -NoTypeInformation
    
    return \$usage
}
\`\`\`

## Acceptance Criteria

- [ ] TokenCounter class with API integration
- [ ] Caching to avoid redundant API calls
- [ ] Batch counting for multiple texts
- [ ] Improved estimation heuristics (code vs text vs JSON)
- [ ] Fallback to estimation if API unavailable
- [ ] Usage metadata tracking (prompt/completion/cached tokens)
- [ ] Token count within 5% of actual for common content
- [ ] API calls complete in < 500ms
- [ ] Cache hit rate > 80% for repeated content

## Testing Strategy

1. **Accuracy Test**: Compare API vs actual model token usage
2. **Cache Test**: Second count of same text uses cache
3. **Batch Test**: Batch counting returns correct counts
4. **Fallback Test**: Estimation used when API unavailable
5. **Performance Test**: < 500ms for single count, < 2s for batch of 100

## Security Considerations

- API key stored securely in environment variable
- No API key logging
- Rate limiting to avoid quota exhaustion
- Graceful degradation if quota exceeded

## Priority

**High** - Accurate counting is foundation for optimization

## References

Gemini CLI implementation:
- `packages/core/src/code_assist/converter.ts` (token counting API)
- `packages/core/src/core/geminiChat.ts` (`usageMetadata` extraction)
- Google AI docs: https://ai.google.dev/api/rest/v1beta/models/countTokens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement sophisticated token counting (beyond character/4) #123

Summary

Background

Implementation Strategy

1. Token Counting API Integration

2. Batch Token Counting

3. Usage Metadata Tracking

Acceptance Criteria

Testing Strategy

Security Considerations

Priority

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: implement sophisticated token counting (beyond character/4) #123

Description

Summary

Background

Implementation Strategy

1. Token Counting API Integration

2. Batch Token Counting

3. Usage Metadata Tracking

Acceptance Criteria

Testing Strategy

Security Considerations

Priority

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions