This document describes the comprehensive input validation system implemented in Jekyll Minifier v0.2.0+, building on the existing ReDoS protection and security features.
The input validation system provides multiple layers of security and data integrity checking while maintaining 100% backward compatibility with existing configurations.
Located in Jekyll::Minifier::ValidationHelpers, this module provides reusable validation functions:
- Validates boolean configuration values
- Accepts:
true,false,"true","false","1","0",1,0 - Graceful degradation: logs warnings for invalid values, returns
nil
- Range checking with configurable min/max values
- Type coercion from strings to integers
- Overflow protection
- Length limits (default: 10,000 characters)
- Control character detection and rejection
- Safe encoding validation
- Size limits (default: 1,000 elements)
- Element filtering for invalid items
- Automatic conversion from single values
- Size limits (default: 100 key-value pairs)
- Key and value type validation
- Nested structure support
- File size limits (default: 50MB)
- Encoding validation
- Content-specific validation:
- CSS: Brace balance checking
- JavaScript: Parentheses and brace balance
- JSON: Basic structure validation
- HTML: Tag balance checking
- Directory traversal prevention (
../, `~/') - Null byte detection
- Path injection protection
The CompressionConfig class now includes:
- Real-time validation during configuration loading
- Type-specific validation per configuration key
- Graceful fallback to safe defaults
- Terser/Uglifier argument safety checking
- Known dangerous option detection
- Legacy option filtering (
harmonyremoval) - Nested configuration validation
- All existing configurations continue to work
- Invalid values fallback to safe defaults
- No breaking changes to public API
All compression methods now include:
- Content safety checking before compression
- File path security validation
- Size and encoding verification
- Graceful compression failure handling
- Detailed error logging with file paths
- Fallback to original content on errors
- File-specific validation based on extension
- Context-aware error messages
- Secure file path handling
- Works seamlessly with existing ReDoS protection
- Layered security approach
- Pattern validation at multiple levels
- Memory exhaustion prevention
- CPU usage limits through timeouts
- File size restrictions
- Control character filtering
- Encoding validation
- Type coercion safety
- Directory traversal prevention
- Null byte injection protection
- Safe file handling
- All HTML compression options
- File type compression toggles (
compress_css,compress_javascript,compress_json) - CSS enhancement options
- PHP preservation settings
preserve_patterns(max 100 patterns)exclude(max 100 exclusions)
terser_args(max 20 options)uglifier_args(legacy, with filtering)
jekyll-minifier:
# Boolean options - validated and converted
compress_css: true
compress_javascript: "true" # Converted to boolean
remove_comments: 1 # Converted to boolean
# Array options - validated and filtered
preserve_patterns:
- "<!-- PRESERVE -->.*?<!-- /PRESERVE -->"
- "<script[^>]*>.*?</script>"
exclude:
- "*.min.css"
- "vendor/**"
# Hash options - validated for safety
terser_args:
compress: true
mangle: false
ecma: 2015
# Note: 'harmony' option automatically filtered- Configuration Warnings: Invalid config values with fallbacks
- Content Warnings: Unsafe file content detection
- Security Warnings: Path injection or other security issues
- Compression Warnings: Processing errors with graceful recovery
Jekyll Minifier: Invalid boolean value for 'compress_css': invalid_value. Using default.
Jekyll Minifier: File too large for safe processing: huge_file.css (60MB > 50MB)
Jekyll Minifier: Unsafe file path detected: ../../../etc/passwd
Jekyll Minifier: CSS compression failed for malformed.css: syntax error. Using original content.
- Validation occurs only during configuration loading
- Content validation uses efficient algorithms
- Minimal overhead during normal operation
- Caching of validated configuration values
- Configuration validation: <1ms typical
- Content validation: <10ms for large files
- Path validation: <0.1ms per path
- Overall impact: <1% performance overhead
- ✅ All existing configurations work unchanged
- ✅ Same default behavior for unspecified options
- ✅ No new required configuration options
- ✅ Existing API methods unchanged
- Invalid configurations log warnings but don't fail builds
- Dangerous values replaced with safe defaults
- Legacy options automatically filtered or converted
- 36 dedicated input validation tests
- 106+ integration tests with existing functionality
- Edge case testing for all validation scenarios
- Security boundary testing
- Unit Tests: Individual validation method testing
- Integration Tests: Validation with compression workflow
- Security Tests: Boundary and attack vector testing
- Compatibility Tests: Backward compatibility verification
# Before (potentially unsafe)
jekyll-minifier:
preserve_patterns: "not_an_array"
terser_args: [1, 2, 3] # Invalid structure
compress_css: "maybe" # Invalid boolean
# After (automatically validated and corrected)
# preserve_patterns: ["not_an_array"] # Auto-converted to array
# terser_args: nil # Invalid structure filtered
# compress_css: true # Invalid boolean uses default# Large file handling
large_css = File.read('huge_stylesheet.css') # 60MB file
# Validation automatically detects oversized content
# Logs warning and skips compression for safety
# Malformed content handling
malformed_js = 'function test() { return <invalid> ; }'
# Compression fails gracefully, original content preserved
# Error logged for developer awarenessThe input validation system enhances and complements existing security features:
- ReDoS Protection: Works alongside regex pattern validation
- CSS Performance: Maintains PR #61 optimizations with safety checks
- Terser Migration: Validates modern Terser configurations while filtering legacy options
- Error Handling: Builds upon existing error recovery mechanisms
This creates a comprehensive, layered security approach that protects against various attack vectors while maintaining the performance and functionality that users expect.