Skip to content

feat: enhance person scraper with employment type extraction and grouped experience parsing#272

Open
miguelcostero wants to merge 1 commit intojoeyism:masterfrom
miguelcostero:feat/employment-type-and-grouped-experiences
Open

feat: enhance person scraper with employment type extraction and grouped experience parsing#272
miguelcostero wants to merge 1 commit intojoeyism:masterfrom
miguelcostero:feat/employment-type-and-grouped-experiences

Conversation

@miguelcostero
Copy link

Summary

This PR enhances the PersonScraper with improved experience extraction capabilities:

  • Employment Type Extraction: Added support for extracting employment type (Full-time, Part-time, Contract, Freelance, etc.) from experience entries
  • Grouped Experience Parsing: Improved parsing of grouped/nested experiences where multiple roles are listed under a single company
  • Better Fallback Logic: Details page is now tried first, with main page as fallback for more reliable data extraction

Changes

Models

  • Added employment_type field to the Experience model

Person Scraper

  • Added EMPLOYMENT_TYPE_MAP for normalizing employment type values
  • New methods for parsing grouped experiences:
    • _get_experiences_from_details() - Extract from details/experience page
    • _get_experiences_from_main_page() - Fallback extraction from main profile
    • _parse_grouped_main_page_role() - Parse nested roles on main page
    • _parse_grouped_details_experience() - Parse grouped entries from details page
    • _parse_grouped_detail_role() - Parse individual role in grouped experience
  • Helper methods:
    • _is_experience_time_text() - Detect date/duration text
    • _normalize_employment_type() - Normalize employment type strings
    • _extract_employment_type_from_text() - Extract employment type from text
    • _split_company_and_employment_type() - Split company name from employment type

Tests

  • Added HTML fixtures for grouped experiences (main page and details page)
  • Added unit tests for grouped experience parsing

…ped experience parsing

- Add employment_type field to Experience model
- Implement employment type detection from text parsing
- Add support for grouped experiences (multiple roles under same company)
- Refactor experience extraction into separate methods for main page and details page
- Add fallback parsing strategies for different LinkedIn layouts
- Introduce EMPLOYMENT_TYPE_MAP for normalization
- Add comprehensive tests for grouped experiences parsing
- Add pytest and pytest-asyncio to dependencies
@miguelcostero miguelcostero changed the base branch from main to master February 4, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant