AdultTime / Algolia API: Handle affiliate sites better#2709
AdultTime / Algolia API: Handle affiliate sites better#2709nrg101 wants to merge 3 commits intostashapp:masterfrom
Conversation
|
Im not sure how I feel about this. They are technically studio-ran sites but "unofficial" and the parent site has been extremely... hostile to scrape |
|
It was because of a comment by AdultSun at stashdb.org about a create Edit shouldn't have the secondary site as it's an affiliate rather than the official studio site. I couldn't really confirm one way or the other but the secondary site footers don't seem to have the same official ownership etc. mentioned. I like the improvement here where a secondary site scene URL can effectively find the official studio API entry and therefore get all the info and official/primary scene URL, I think that part should stand. The part about not adding secondary (promo/affiliate) site URLs when scraping the scene at the API, I think that could be debatable. I'd say it's harmless and the secondary sites are very consistent and usually provide much longer previews. Maybe this needs escalating to AdultSun and the Ministry Of Truth? |
|
These kind of affiliate sites have been around for a long time. From my understanding, they are all built and managed by 3rd parties who get paid for every signup funneled through their referral links, which really isn't much different from Amazon's affiliate program. From memory, this particular network of affiliate sites (don't know the owner but they all use identical formatting) cover more than just Adult Time / Gamma, which isn't unusual at all for affiliate operators. I can't come up with any examples off the top of my head though. At best, I think these can be considered semi-official, or maybe 2nd party sources. These ones appear to have permission to populate their sites with data through the studios' API. But for me, scraping them directly feels similar to scraping TPDB or Data18 instead of the studio directly: sure it'll be mostly the same since those databases scraped the studio too, but you also have to expect minor shifts in the data when you grab a copy of a copy. The edit nrg saw earlier was an example of that shift. I think scraping the affiliate grabbed a smaller cover image and the release date shifted by a day compared to scraping the studio directly. It was the same "birthday boy" link used at the top of this thread if you want to re-scrape to check though, I might have the details wrong. On the StashDB side of things, the consensus early on was only to use these affiliate sites as a last resort. For example, if a scene was purged from a MindGeek site and we couldn't find anything through the Wayback Machine, a few editors starting using an affiliate site called NewBrazz as a placeholder studio link and primary source. Again, their data mostly matched the source, but I remember their cover images specifically were a random gallery image instead. Pretty sure scene aliases were often different too, but I can't remember if the release dates were accurate or not. In short, as a source they're often better than nothing, but I would never use them over an actual studio link. I'm also pretty sure most of the editors who have submitted these Adult Time affiliate links have zero idea they're not actually official, which makes sense since the affiliate operators try very hard to make them seem official too. But adding the affiliate URLs through the scraper just makes it seem like we think these are official too, which is misleading for both local scrapers and StashDB editors. |
|
For reference, I found an example of a TeamSkeet / Reptyle studio from what looks like the same affiliate network as the examples above: So this is just confirmation that these affiliate sites are not exclusive to Adult Time / Gamma studios and are definitely owned and operated by a 3rd party. |
Scraper type(s)
Examples to test
Short description
The sites above are affiliate sites that are not officially run by the corresponding studio. However, they do consistently have the correct title, date, cover image, description, etc.
Rather than just xpath scraping the page at the affiliate site, this change does a simple scrape of the scene at the affiliate site, and then uses the title/slug and date to search Algolia API for the same scene. This results in a proper studio scene result including studio scene URL.