Skip to content

feat: implement compression for stored optimization results #126

@ooples

Description

@ooples

Summary

Implement gzip compression for stored optimization results, session files, and cache data to reduce disk I/O and storage costs.

Background

Currently, the token optimizer stores large amounts of data uncompressed:

  • Session files: current-session.txt, operation CSV files (grow to several MB)
  • Cache data: File content cache, optimization results (can be 10-100MB)
  • Operation logs: CSV logs with full tool input/output (grow rapidly)

Gemini CLI uses gzip compression to reduce storage size by 70-90%, improving:

  • Disk I/O performance (smaller files read/write faster)
  • Storage costs (cloud storage pricing)
  • Network transfer (when syncing logs/analytics)

Gemini CLI Pattern Reference

File: packages/a2a-server/src/persistence/gcs.ts

Pattern: Gzip compression for stored data

import { createGzip, createGunzip } from 'zlib';
import { pipeline } from 'stream/promises';

async function saveCompressed(filePath: string, data: string): Promise<void> {
  const gzip = createGzip({ level: 9 }); // Maximum compression
  const input = Readable.from(data);
  const output = createWriteStream(filePath + '.gz');

  await pipeline(input, gzip, output);
}

async function loadCompressed(filePath: string): Promise<string> {
  const gunzip = createGunzip();
  const input = createReadStream(filePath + '.gz');
  const chunks: Buffer[] = [];

  await pipeline(
    input,
    gunzip,
    new Writable({
      write(chunk, encoding, callback) {
        chunks.push(chunk);
        callback();
      }
    })
  );

  return Buffer.concat(chunks).toString('utf8');
}

File: packages/core/src/storage/cacheStorage.ts

Pattern: Transparent compression for cache entries

class CompressedCacheStorage {
  async set(key: string, value: any): Promise<void> {
    const json = JSON.stringify(value);
    const compressed = await compressString(json);
    await fs.writeFile(this.getPath(key), compressed);
  }

  async get(key: string): Promise<any> {
    const compressed = await fs.readFile(this.getPath(key));
    const json = await decompressString(compressed);
    return JSON.parse(json);
  }

  private async compressString(data: string): Promise<Buffer> {
    return new Promise((resolve, reject) => {
      zlib.gzip(data, { level: 9 }, (err, result) => {
        if (err) reject(err);
        else resolve(result);
      });
    });
  }
}

Implementation Strategy

1. Create Compression Utilities

Location: C:\Users\cheat\.claude-global\hooks\handlers\token-optimizer-orchestrator.ps1

# Compression utility functions
function Compress-String {
    param(
        [string]$InputString,
        [int]$CompressionLevel = 6  # 1-9, default optimal balance
    )

    try {
        # Convert string to bytes
        $bytes = [System.Text.Encoding]::UTF8.GetBytes($InputString)

        # Create memory streams
        $inputStream = New-Object System.IO.MemoryStream
        $inputStream.Write($bytes, 0, $bytes.Length)
        $inputStream.Position = 0

        $outputStream = New-Object System.IO.MemoryStream

        # Create GZip stream with specified compression level
        $gzipStream = New-Object System.IO.Compression.GZipStream(
            $outputStream,
            [System.IO.Compression.CompressionLevel]::Optimal
        )

        # Compress
        $inputStream.CopyTo($gzipStream)
        $gzipStream.Close()

        # Return compressed bytes
        return $outputStream.ToArray()
    } catch {
        Write-Log "Failed to compress string: $($_.Exception.Message)" "ERROR"
        return $null
    } finally {
        if ($inputStream) { $inputStream.Dispose() }
        if ($outputStream) { $outputStream.Dispose() }
        if ($gzipStream) { $gzipStream.Dispose() }
    }
}

function Decompress-String {
    param(
        [byte[]]$CompressedBytes
    )

    try {
        # Create memory streams
        $inputStream = New-Object System.IO.MemoryStream
        $inputStream.Write($CompressedBytes, 0, $CompressedBytes.Length)
        $inputStream.Position = 0

        $outputStream = New-Object System.IO.MemoryStream

        # Create GZip stream for decompression
        $gzipStream = New-Object System.IO.Compression.GZipStream(
            $inputStream,
            [System.IO.Compression.CompressionMode]::Decompress
        )

        # Decompress
        $gzipStream.CopyTo($outputStream)
        $gzipStream.Close()

        # Convert bytes back to string
        $bytes = $outputStream.ToArray()
        return [System.Text.Encoding]::UTF8.GetString($bytes)
    } catch {
        Write-Log "Failed to decompress string: $($_.Exception.Message)" "ERROR"
        return $null
    } finally {
        if ($inputStream) { $inputStream.Dispose() }
        if ($outputStream) { $outputStream.Dispose() }
        if ($gzipStream) { $gzipStream.Dispose() }
    }
}

function Save-CompressedFile {
    param(
        [string]$FilePath,
        [string]$Content
    )

    $compressed = Compress-String -InputString $Content
    if ($null -eq $compressed) {
        Write-Log "Failed to compress file: $FilePath" "ERROR"
        return $false
    }

    try {
        [System.IO.File]::WriteAllBytes($FilePath + ".gz", $compressed)

        # Log compression ratio
        $originalSize = [System.Text.Encoding]::UTF8.GetByteCount($Content)
        $compressedSize = $compressed.Length
        $ratio = [Math]::Round((1 - ($compressedSize / $originalSize)) * 100, 2)
        Write-Log "Compressed $FilePath - Original: $originalSize bytes, Compressed: $compressedSize bytes, Ratio: $ratio%" "DEBUG"

        return $true
    } catch {
        Write-Log "Failed to write compressed file: $($_.Exception.Message)" "ERROR"
        return $false
    }
}

function Load-CompressedFile {
    param(
        [string]$FilePath
    )

    if (-not (Test-Path "$FilePath.gz")) {
        Write-Log "Compressed file not found: $FilePath.gz" "ERROR"
        return $null
    }

    try {
        $compressed = [System.IO.File]::ReadAllBytes("$FilePath.gz")
        $content = Decompress-String -CompressedBytes $compressed

        if ($null -eq $content) {
            Write-Log "Failed to decompress file: $FilePath.gz" "ERROR"
            return $null
        }

        return $content
    } catch {
        Write-Log "Failed to read compressed file: $($_.Exception.Message)" "ERROR"
        return $null
    }
}

2. Compressed Session File Storage

Location: Replace Write-SessionFile function

function Write-SessionFile {
    param(
        [string]$FilePath,
        $SessionObject,
        [bool]$UseCompression = $true
    )

    $maxRetries = 5
    $retryDelayMs = 100

    for ($i = 0; $i -lt $maxRetries; $i++) {
        try {
            # Convert to JSON
            $json = $SessionObject | ConvertTo-Json -Depth 100

            if ($UseCompression) {
                # Save compressed
                if (Save-CompressedFile -FilePath $FilePath -Content $json) {
                    # Also save uncompressed for backward compatibility (optional)
                    [System.IO.File]::WriteAllText($FilePath, $json, [System.Text.Encoding]::UTF8)
                    return $true
                }
            } else {
                # Save uncompressed
                [System.IO.File]::WriteAllText($FilePath, $json, [System.Text.Encoding]::UTF8)
                return $true
            }
        } catch [System.IO.IOException] {
            Write-Log "Failed to acquire write lock on session file '$FilePath', retrying... ($($_.Exception.Message))" "WARN"
            Start-Sleep -Milliseconds $retryDelayMs
        } catch {
            Write-Log "Failed to write session file '$FilePath': $($_.Exception.Message)" "ERROR"
            return $false
        }
    }

    Write-Log "Failed to write session file '$FilePath' after multiple retries due to locking." "ERROR"
    return $false
}

function Read-SessionFile {
    param(
        [string]$FilePath,
        [bool]$TryCompressed = $true
    )

    # Try compressed first
    if ($TryCompressed -and (Test-Path "$FilePath.gz")) {
        $json = Load-CompressedFile -FilePath $FilePath
        if ($null -ne $json) {
            return $json | ConvertFrom-Json
        }
    }

    # Fallback to uncompressed
    if (Test-Path $FilePath) {
        $json = [System.IO.File]::ReadAllText($FilePath, [System.Text.Encoding]::UTF8)
        return $json | ConvertFrom-Json
    }

    return $null
}

3. Compressed Operation Log Storage

Location: CSV logging function

function Write-OperationLog {
    param(
        [string]$LogFile,
        [object[]]$Operations
    )

    # Convert to CSV
    $csv = $Operations | ConvertTo-Csv -NoTypeInformation | Out-String

    # Save compressed
    if (Save-CompressedFile -FilePath $LogFile -Content $csv) {
        Write-Log "Operation log compressed and saved: $LogFile.gz" "DEBUG"

        # Delete uncompressed version to save space (optional)
        if (Test-Path $LogFile) {
            Remove-Item -Path $LogFile -Force -ErrorAction SilentlyContinue
        }
    } else {
        # Fallback: save uncompressed
        [System.IO.File]::WriteAllText($LogFile, $csv, [System.Text.Encoding]::UTF8)
    }
}

function Read-OperationLog {
    param(
        [string]$LogFile
    )

    # Try compressed first
    if (Test-Path "$LogFile.gz") {
        $csv = Load-CompressedFile -FilePath $LogFile
        if ($null -ne $csv) {
            return $csv | ConvertFrom-Csv
        }
    }

    # Fallback to uncompressed
    if (Test-Path $LogFile) {
        return Import-Csv -Path $LogFile
    }

    return @()
}

4. Compressed Cache Storage (Integration with LRU Cache from Issue #5)

Location: LruCache class enhancement

class LruCache {
    [string]$PersistencePath
    [bool]$UseCompression = $true

    # ... existing code ...

    # Persist cache to disk (compressed)
    [bool] SaveToDisk() {
        if ([string]::IsNullOrEmpty($this.PersistencePath)) {
            return $false
        }

        # Convert cache to hashtable for serialization
        $data = @{
            Entries = @{}
            Stats = $this.GetStats()
        }

        foreach ($key in $this.Cache.Keys) {
            $entry = $this.Cache[$key]
            $data.Entries[$key] = @{
                Value = $entry.Value
                Timestamp = $entry.Timestamp.ToString("o")  # ISO 8601
            }
        }

        $json = $data | ConvertTo-Json -Depth 100

        if ($this.UseCompression) {
            return Save-CompressedFile -FilePath $this.PersistencePath -Content $json
        } else {
            [System.IO.File]::WriteAllText($this.PersistencePath, $json, [System.Text.Encoding]::UTF8)
            return $true
        }
    }

    # Load cache from disk (compressed)
    [bool] LoadFromDisk() {
        if ([string]::IsNullOrEmpty($this.PersistencePath)) {
            return $false
        }

        $json = $null

        # Try compressed first
        if ($this.UseCompression -and (Test-Path "$($this.PersistencePath).gz")) {
            $json = Load-CompressedFile -FilePath $this.PersistencePath
        }

        # Fallback to uncompressed
        if ($null -eq $json -and (Test-Path $this.PersistencePath)) {
            $json = [System.IO.File]::ReadAllText($this.PersistencePath, [System.Text.Encoding]::UTF8)
        }

        if ($null -eq $json) {
            return $false
        }

        $data = $json | ConvertFrom-Json

        # Restore entries
        foreach ($key in $data.Entries.PSObject.Properties.Name) {
            $entry = $data.Entries.$key
            $timestamp = [datetime]::Parse($entry.Timestamp)

            # Only restore non-expired entries
            if ($this.TtlSeconds -le 0 -or ((Get-Date) - $timestamp).TotalSeconds -le $this.TtlSeconds) {
                $this.Cache[$key] = [LruCacheEntry]::new($entry.Value)
                $this.Cache[$key].Timestamp = $timestamp
            }
        }

        Write-Log "Loaded $($this.Cache.Count) entries from compressed cache: $($this.PersistencePath).gz" "DEBUG"
        return $true
    }
}

5. Automatic Compression of Old Logs

Location: SessionStart or periodic cleanup

function Compress-OldLogs {
    param(
        [string]$LogDirectory,
        [int]$DaysOld = 1  # Compress logs older than 1 day
    )

    $cutoffDate = (Get-Date).AddDays(-$DaysOld)
    $compressed = 0
    $totalSaved = 0

    Get-ChildItem -Path $LogDirectory -Filter "*.csv" | Where-Object {
        $_.LastWriteTime -lt $cutoffDate -and -not (Test-Path "$($_.FullName).gz")
    } | ForEach-Object {
        $content = [System.IO.File]::ReadAllText($_.FullName, [System.Text.Encoding]::UTF8)
        $originalSize = $_.Length

        if (Save-CompressedFile -FilePath $_.FullName -Content $content) {
            # Delete original after successful compression
            Remove-Item -Path $_.FullName -Force

            $compressedSize = (Get-Item "$($_.FullName).gz").Length
            $saved = $originalSize - $compressedSize
            $totalSaved += $saved
            $compressed++
        }
    }

    if ($compressed -gt 0) {
        $savedMB = [Math]::Round($totalSaved / 1MB, 2)
        Write-Log "Compressed $compressed old log files, saved $savedMB MB" "INFO"
    }
}

# Call during SessionStart
if ($Phase -eq "SessionStart") {
    Compress-OldLogs -LogDirectory $LOG_DIR -DaysOld 1
}

Acceptance Criteria

  • Compress-String and Decompress-String utility functions
  • Save-CompressedFile and Load-CompressedFile functions
  • Compressed session file storage (current-session.txt.gz)
  • Compressed operation log storage (operations-*.csv.gz)
  • Compressed cache persistence (LRU cache to/from disk)
  • Automatic compression of old logs (older than 1 day)
  • Compression ratio logging and metrics
  • Backward compatibility (can read uncompressed files)
  • Storage size reduction of 70-90% for text data
  • No performance degradation (compression overhead < 10ms per file)

Testing Strategy

Unit Tests

# Test compression/decompression round-trip
$original = "This is test content that should be compressed." * 100
$compressed = Compress-String -InputString $original
$decompressed = Decompress-String -CompressedBytes $compressed
$decompressed | Should -Be $original

# Test compression ratio
$largeText = Get-Content -Path "large-file.txt" -Raw
$compressed = Compress-String -InputString $largeText
$ratio = 1 - ($compressed.Length / [System.Text.Encoding]::UTF8.GetByteCount($largeText))
$ratio | Should -BeGreaterThan 0.7  # At least 70% reduction

# Test file save/load
$content = "Test content for file compression"
Save-CompressedFile -FilePath "test.txt" -Content $content | Should -Be $true
$loaded = Load-CompressedFile -FilePath "test.txt"
$loaded | Should -Be $content
Test-Path "test.txt.gz" | Should -Be $true

Integration Tests

# Test session file compression
$session = @{
    sessionId = "test-session"
    totalOperations = 100
    optimizationSuccesses = 50
}
Write-SessionFile -FilePath "session.txt" -SessionObject $session -UseCompression $true
Test-Path "session.txt.gz" | Should -Be $true
$loaded = Read-SessionFile -FilePath "session.txt" -TryCompressed $true
$loaded.sessionId | Should -Be "test-session"

# Test operation log compression
$operations = 1..100 | ForEach-Object {
    [PSCustomObject]@{
        Timestamp = Get-Date
        Tool = "Read"
        Duration = 123
    }
}
Write-OperationLog -LogFile "ops.csv" -Operations $operations
Test-Path "ops.csv.gz" | Should -Be $true
$loaded = Read-OperationLog -LogFile "ops.csv"
$loaded.Count | Should -Be 100

Performance Tests

# Compression should be fast (<100ms for typical session file)
$session = @{ ... }  # Typical session object
Measure-Command {
    Write-SessionFile -FilePath "session.txt" -SessionObject $session -UseCompression $true
} | Should -BeLessThan (New-TimeSpan -Milliseconds 100)

# Decompression should be fast (<50ms)
Measure-Command {
    Read-SessionFile -FilePath "session.txt" -TryCompressed $true
} | Should -BeLessThan (New-TimeSpan -Milliseconds 50)

Priority

MEDIUM - Compression provides long-term benefits for storage and I/O performance but is not critical for immediate functionality. Should be implemented after core optimization features (token counting, LRU cache) are working.

Expected Impact

  • Session files: Typically 5-10KB uncompressed → 1-2KB compressed (70-80% reduction)
  • Operation logs: Can grow to 50-100MB → 5-15MB compressed (85-90% reduction)
  • Cache persistence: 10-50MB → 2-10MB compressed (70-80% reduction)
  • Disk I/O: 30-50% faster reads/writes for large files
  • Storage costs: 70-90% reduction in disk usage

For a project with 1000 operations over multiple sessions:

  • Uncompressed: ~500MB of logs and cache
  • Compressed: ~75MB of logs and cache
  • Savings: 425MB (85% reduction)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions