Skip to content

Add PersistentBrowserManager for automatic session persistence#270

Open
irvingpop wants to merge 1 commit intojoeyism:masterfrom
irvingpop:master
Open

Add PersistentBrowserManager for automatic session persistence#270
irvingpop wants to merge 1 commit intojoeyism:masterfrom
irvingpop:master

Conversation

@irvingpop
Copy link

Summary

Adds native support for Playwright's launch_persistent_context() to linkedin-scraper, providing automatic session persistence and eliminating the need for manual session.json management.

Motivation

The current BrowserManager requires explicit save_session() and load_session() calls, which:

  • Are easy to forget, leading to re-authentication loops
  • Don't work like real browsers (cookies expire, manual management is fragile)
  • Require complex volume mounting patterns in Docker
  • Are more susceptible to LinkedIn's anti-bot detection

Playwright's persistent context solves these issues by using actual Chromium user data directories that persist automatically.

Changes

New Classes & Functions

  • PersistentBrowserManager - Drop-in replacement for BrowserManager using persistent contexts

    • Automatic session persistence to user data directory
    • Compatible .page property (works with all existing scrapers)
    • clear_profile() method for profile management
  • migrate_session_to_profile() - Utility to migrate from session.json to persistent profiles

Documentation

  • Added "Session Management" section to README comparing both approaches
  • Updated authentication examples with persistent profile options
  • Added Docker usage examples
  • Migration guide for existing users

Sample Scripts

  • samples/create_persistent_session.py - Create authenticated persistent profile
  • samples/scrape_person_persistent.py - Example using persistent profile
  • samples/migrate_session.py - Migrate existing session.json files

Tests

  • 19 comprehensive tests for PersistentBrowserManager
  • All existing tests pass (no regressions)
  • Test coverage includes:
    • Cookie persistence (with proper expiration)
    • Profile directory management
    • Concurrent access handling
    • Migration from session.json
    • Scraper compatibility

Usage Example

Before (session.json):

from linkedin_scraper import BrowserManager, PersonScraper

async with BrowserManager() as browser:
    await browser.load_session("session.json")
    scraper = PersonScraper(browser.page)
    person = await scraper.scrape("https://linkedin.com/in/username")
    await browser.save_session("session.json")  # Easy to forget!

After (persistent profile):

from linkedin_scraper import PersistentBrowserManager, PersonScraper

async with PersistentBrowserManager(user_data_dir="~/.linkedin/profile") as browser:
    scraper = PersonScraper(browser.page)
    person = await scraper.scrape("https://linkedin.com/in/username")
    # Session automatically saved!

Checklist

  • Tests added and passing
  • Documentation updated
  • Code formatted with black
  • Linting passes (flake8)
  • No breaking changes
  • Example scripts provided
  • Migration path documented

Implements Playwright's launch_persistent_context() to eliminate
manual session.json management and provide more reliable session
persistence across browser restarts.

Changes:
- Add PersistentBrowserManager class using persistent browser contexts
- Add migrate_session_to_profile() utility for migration
- Add comprehensive test suite (19 tests, all passing)
- Update documentation with usage examples and migration guide
- Add sample scripts demonstrating persistent profile usage
- Maintain full backward compatibility with existing BrowserManager

Benefits:
- Automatic cookie and session persistence (no save/load cycles)
- Works like real Chrome profiles (better anti-bot resistance)
- Zero breaking changes to existing code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@irvingpop
Copy link
Author

hi @joeyism could you please take a look when you get a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant