Skip to content

fix(utils): use inner_text() in rate limit detection#278

Open
stickerdaniel wants to merge 1 commit intojoeyism:masterfrom
stickerdaniel:fix/rate-limit-false-positive
Open

fix(utils): use inner_text() in rate limit detection#278
stickerdaniel wants to merge 1 commit intojoeyism:masterfrom
stickerdaniel:fix/rate-limit-false-positive

Conversation

@stickerdaniel
Copy link
Contributor

@stickerdaniel stickerdaniel commented Feb 11, 2026

detect_rate_limit() false-fires on every page because text_content() picks up invisible React RSC serialized JSON that LinkedIn now embeds in the DOM. The phrase "something went wrong. please try again later." appears inside preloaded RSC data like:

"children":["something went wrong. please try again later."]

This matches the "try again later" check and raises RateLimitError even on perfectly normal pages.

Fix: Switch from text_content() to inner_text() to return only visible text, which is already the pattern used in all the scrapers (person.py, company.py, job.py).

Resolves #277, likely fixes #275

text_content() captures invisible React RSC serialized JSON that
LinkedIn now embeds on every page, containing "try again later" as
a preloaded error template. This causes false positive rate limit
detection on every scrape.

inner_text() returns only visible text, matching the pattern used
throughout the rest of the codebase.

Resolves: joeyism#277
See also: joeyism#275
Copilot AI review requested due to automatic review settings February 11, 2026 23:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes false-positive rate limit detection triggered by hidden React RSC serialized content embedded in LinkedIn pages by switching the DOM text extraction method to only consider visible text.

Changes:

  • Update detect_rate_limit() to use Locator.inner_text() instead of text_content() when scanning the page for rate-limit phrases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 89 to +91
# Check for rate limit messages
try:
body_text = await page.locator('body').text_content(timeout=1000)
body_text = await page.locator('body').inner_text(timeout=1000)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s currently no automated test coverage for detect_rate_limit() (no tests reference it), and this change tweaks detection semantics in a way that’s easy to regress. Consider adding a small unit test using page.set_content() with hidden DOM text containing "try again later" to ensure it does not raise, and another with visible text to ensure it does raise.

Copilot uses AI. Check for mistakes.
stickerdaniel added a commit to stickerdaniel/linkedin-mcp-server that referenced this pull request Feb 12, 2026
Point dependency at stickerdaniel/linkedin_scraper fork
(fix/rate-limit-false-positive) to fix detect_rate_limit()
false-firing on React RSC payloads.

Also update docs with detailed release workflow notes and
bump opencode agent models to gpt-5.3-codex.

See also: joeyism/linkedin_scraper#278
stickerdaniel added a commit to stickerdaniel/linkedin-mcp-server that referenced this pull request Feb 12, 2026
Point dependency at stickerdaniel/linkedin_scraper fork
(fix/rate-limit-false-positive) to fix detect_rate_limit()
false-firing on React RSC payloads.

Also update docs with detailed release workflow notes and
bump opencode agent models to gpt-5.3-codex.

See also: joeyism/linkedin_scraper#278
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] detect_rate_limit() false positive from React RSC payload individual scraping

2 participants