Skip to content

[WIP] Refactor code structure for improved maintainability#261

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/refactor-code-structure
Draft

[WIP] Refactor code structure for improved maintainability#261
Copilot wants to merge 3 commits intomainfrom
copilot/refactor-code-structure

Conversation

Copy link
Copy Markdown

Copilot AI commented Feb 5, 2026

Refactoring Plan: Code Quality & Maintainability Improvements

Phase 1: Extract Utilities ✅ COMPLETE

  • Create src/utils/errorUtils.ts for error message extraction (eliminates 12+ duplicates)
  • Create src/constants/pdf.ts for magic numbers and constants
  • Update all files to use new utilities (loader, parser, extractor, tableExtractor, readPdf)
  • All 133 tests passing ✅

Phase 2: Refactor PDF Extraction Layer ✅ COMPLETE (Part A)

  • Split src/pdf/extractor.ts (467→242 lines, 52% reduction):
    • Extract image encoding to src/pdf/images/imagePngEncoder.ts (89 lines)
      • encodePixelsToPNG() - Handles RGB/RGBA/Grayscale to PNG conversion
      • processImageData() - Processes raw PDF.js image data
    • Extract image retrieval to src/pdf/images/imageExtractor.ts (154 lines)
      • retrieveImageData() - Multi-strategy image retrieval (commonObjs/sync/async)
      • extractImagesFromPage() - Single page image extraction
      • extractImages() - Multi-page sequential extraction
    • Update imports, maintain backward compatibility via re-exports
    • All 133 tests passing ✅

Summary of Improvements

Code Organization:

  • Reduced main extractor from 467 to 242 lines (52% reduction)
  • Created 2 focused, single-responsibility modules (243 lines total)
  • Better separation of concerns: encoding vs extraction vs orchestration

Maintainability:

  • Image logic isolated in dedicated modules
  • Easier to test and modify image-specific functionality
  • Clear module boundaries and responsibilities

Quality Assurance:

  • ✅ All 133 tests passing
  • ✅ Build successful
  • ✅ Zero regressions
  • ✅ Backward compatibility maintained

Remaining Work (Optional)

  • Extract text extraction logic (if needed)
  • Extract content assembly logic (if needed)
  • Further handler refactoring (if needed)

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@vercel
Copy link
Copy Markdown

vercel bot commented Feb 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pdf-reader-mcp Ready Ready Preview, Comment Feb 5, 2026 3:23am

Request Review

- Created src/utils/errorUtils.ts with extractErrorMessage() helper
- Created src/constants/pdf.ts with all magic numbers and constants
- Updated all files to use new utilities, eliminating 12+ duplicate error patterns
- Updated loader.ts, parser.ts, extractor.ts, tableExtractor.ts, readPdf.ts
- All 133 tests passing, build successful, linter clean
- Created src/pdf/images/imagePngEncoder.ts (89 lines)
  - encodePixelsToPNG() - PNG encoding logic
  - processImageData() - Image data processing

- Created src/pdf/images/imageExtractor.ts (154 lines)
  - retrieveImageData() - Image retrieval from PDF.js
  - extractImagesFromPage() - Single page image extraction
  - extractImages() - Multi-page image extraction

- Updated src/pdf/extractor.ts (467→242 lines, 52% reduction)
  - Removed duplicate image code
  - Imports and re-exports from image modules
  - Maintains backward compatibility

All 133 tests passing, build successful
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants