A LinkedIn scraping tool that retrieves email information from 1st degree connections shared with a starting profile.
The script currently performs the following operations:
- Authentication: Uses Puppeteer with persistent session storage to maintain LinkedIn login
- Profile Navigation: Navigates to a specified LinkedIn profile and accesses their connections
- Connection Filtering: Filters to show only 1st degree shared connections
- Data Extraction: Scrapes profile information including:
- Full Name
- Profile URL
- Current Employer
- Pagination: Currently handles page 1 and page 2 of connections
- Email Extraction: For each connection, visits their contact overlay to extract email addresses
- CSV Output: Creates separate CSV files for each page and email results
connections.csv- Page 1 connections (name, profile URL, employer)connections_page2.csv- Page 2 connectionsconnections_with_emails.csv- Page 1 connections with emailsconnections_page2_with_emails.csv- Page 2 connections with emails
- Master CSV: Single
master_connections.csvfile for all profile captures - Duplicate Prevention: Check for existing records before adding new ones
- Append Logic: Add new pages to existing master file instead of separate files
- Progress Markers: Track current page and processing status in CSV
- Resume Capability: Resume from last successful page if execution breaks
- Status Columns: Add processing status indicators to track completion
- Email CSV: Dedicated
connections_emails.csvfor email results - Easier Appends: Simplified structure for adding new email discoveries
- Cross-Reference: Link emails back to master connections via profile URL
- Page-by-Page: Process one page of connections, then extract all emails for that page
- Sequential Processing: Page 1 → Page 1 emails → Page 2 → Page 2 emails, etc.
- Complete Pagination: Continue until all shared connections are processed
- Global Deduplication: When running on multiple profiles, detect and skip existing connections
- Profile Tracking: Track which profiles have been processed for each connection
- Efficient Processing: Avoid re-processing connections found in previous profile runs
- Dynamic Pagination: Handle any number of pages automatically
- Error Recovery: Graceful handling of navigation failures
- Rate Limiting: Appropriate delays to avoid being blocked
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Profile URL │ -> │ Navigation │ -> │ Page Loop │
│ (Input) │ │ & Filtering │ │ (All Pages) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
v
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Email CSV │ <- │ Email │ <- │ Connection │
│ (Output) │ │ Extraction │ │ Scraping │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
v
┌─────────────────┐
│ Master CSV │
│ (Output) │
└─────────────────┘
node src/index.js https://www.linkedin.com/in/profile-username/The script will:
- Navigate to the specified profile
- Access their shared connections
- Process all pages of 1st degree connections
- Extract emails for each connection
- Save results to master CSV files
- Support resumable execution if interrupted
puppeteer: Web scraping and browser automationcsv-parse: CSV file parsing for progress trackingfs: File system operations for CSV management
master_connections.csv: All connections with progress trackingconnections_emails.csv: Email addresses for all connectionslinkedin-session/: Persistent browser session data