Skip to content

TheNude: convert to Python scraper, add scene/gallery search#2718

Closed
FlashSpazzbo wants to merge 1 commit intostashapp:masterfrom
FlashSpazzbo:feat/thenude-scene-gallery-search
Closed

TheNude: convert to Python scraper, add scene/gallery search#2718
FlashSpazzbo wants to merge 1 commit intostashapp:masterfrom
FlashSpazzbo:feat/thenude-scene-gallery-search

Conversation

@FlashSpazzbo
Copy link
Copy Markdown
Contributor

Converts TheNude from YAML-only to a YAML+Python scraper. Adds scene and gallery search via the magnifying glass / "Scrape with..." UIs.

Scraper type(s)

  • performerByName
  • performerByURL
  • sceneByName
  • sceneByQueryFragment
  • galleryByFragment

Changes

Performer scraping (Python)

  • URLs normalized to canonical /_NNNNN.htm form, eliminating the space-in-URL problem
  • Performer's own name filtered out of aliases
  • Birthdate refuses month-only values rather than silently defaulting to the 1st (would produce wrong data overwriting good data)
  • Career length formatted as "YYYY - YYYY" with spaces to match StashDB
  • fake_tits returns "Fake" as-is (was being mapped to "Augmented" but "Fake" is the actual Stash internal value)
  • Tattoos/piercings dropped — TheNude consistently reports "None" for performers known to have body art

Scene/gallery search (new)

  • sceneByName + sceneByQueryFragment enables the magnifying glass search picker on scenes. Some studios provide both portrait and landscape versions of the same shoot, which TheNude lists as separate URLs; the picker lets the user choose which one
  • galleryByFragment enables "Scrape with..." on galleries

Preserved from upstream

  • XPath-based sceneByURL, galleryByURL, imageByURL handlers

Test cases

  • Performer by URL: https://www.thenude.com/_28377.htm (Carly Lauren)
  • Performer by name: "Carly Lauren"
  • Scene magnifying glass: search "Naked Forest Nymph Shoot" (returns 2 results — landscape + portrait covers — for picker)
  • Gallery scrape: "Nude On The Veranda"

- Move scraper into TheNude/ folder; add Python implementation

- Performer URLs normalized to canonical /_NNNNN.htm form

- Filter performer's own name out of aliases (matches Stash validation)

- Birthdate refuses month-only values rather than defaulting to 1st

- Career length formatted as 'YYYY - YYYY'

- Drop tattoos/piercings (TheNude data unreliable)

- Fix fake_tits: keep 'Fake' as-is (Stash internal value)

- Add scene-by-name + scene-by-query-fragment for magnifying glass picker

- Add gallery-by-fragment (image field stripped, not in Stash schema)

- Preserve XPath sceneByURL/galleryByURL/imageByURL handlers
@feederbox826
Copy link
Copy Markdown
Member

I'm not seeing a good reason to convert to python, what can't be done in yml?

@FlashSpazzbo
Copy link
Copy Markdown
Contributor Author

I'm not seeing a good reason to convert to python, what can't be done in yml?

Birthdate handling.

Test case: Carli Banks (https://www.thenude.com/_6444.htm) lists her birthdate as "November 1985" — month and year, no day. YAML's parseDate parses this and silently defaults to day 01, writing "1985-11-01" to Stash and overwriting whatever correct date may already exist on the performer record. The Python implementation checks for a parsed day and omits the birthdate field entirely if absent, preserving existing data. I don't think YAML can express "parse only if day component is present."

Happy to revert the other changes (URL normalization, career length formatting, fake_tits mapping, dropping tattoos/piercings, and the new sceneByName/sceneByQueryFragment/galleryByFragment search operations) to YAML if the birthdate handling is the only thing that justifies the Python conversion.

@feederbox826
Copy link
Copy Markdown
Member

previous behaviour has been to force 01 date. you can also regex parse to nullify if it doesn't match

@FlashSpazzbo
Copy link
Copy Markdown
Contributor Author

Closing — will resubmit as YAML-only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants