Skip to content

Refactor stash-box fingerprint submissions#6782

Open
InfiniteStash wants to merge 9 commits intostashapp:developfrom
InfiniteStash:fingerprint-submission
Open

Refactor stash-box fingerprint submissions#6782
InfiniteStash wants to merge 9 commits intostashapp:developfrom
InfiniteStash:fingerprint-submission

Conversation

@InfiniteStash
Copy link
Copy Markdown
Collaborator

Drafted with Claude.

I started out trying to add reporting and report counts to the tagger, but it ended up being quite annoying to deal with the current flow of pushing fingerprints to the config store, pulling them back down, then sending all the fingerprints to the backend for stash-box submission.

I've instead migrated the submission queue to sqlite, with add/remove operations, and a submit mutation which executes on the backend for a chosen endpoint without having to include all fingerprints in the mutation input. The tagger now shows the number of reports for a scene match, and allows for reporting wrong matches.

The new mutation also runs async for large number of fingerprints to avoid timeouts when the queue is too large. The next version of stash-box will support batch submissions which should make this unnecessary, but for now it's an ok workaround.

Copy link
Copy Markdown
Collaborator

@Gykes Gykes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general overview with a couple notes no actual testing was done.

Comment on lines +100 to +102
reports
user_submitted
user_reported
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these new fields will it break other Stashbox instance requests? From my understanding they don't return a null they would just error out the entire query.

Not 100% on this on, might need some education.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields were added to stash-box in version 0.6.0 IIRC, which was in early Jan 25. If the stash-box instance doesn't support the fields, it will fail, but that's a fairly old version..

That said, TPDB does not support these fields, and will break. Supporting them can be done by just stubbing them out and returning 0/false.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can notify TPDB when this gets closer to a merge so that it minimizes breakages.

Comment on lines +304 to +309
if len(submissions) > 40 {
// Submit async to avoid timeouts for large batches
go r.submitFingerprintBatch(client, submissions, sceneMap)
} else {
r.submitFingerprintBatch(client, submissions, sceneMap)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a couple notes in this area:

  • Is this 40 due to the default page being 40 or was it just a number you liked?

  • This seems to be its own route. Would that mean that this won't cancel/shutdown on server restart and just crash out if a user restarts mid submission batch? Might be a good idea to pass this through the JobManager so it can be tracked like a regular task.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

40 is just a number that seems reasonable. The problem is fingerprints are submitted one by one, so if you have hundreds of them, the browser request times out before they can finish submitting, and then the batch gets cancelled without cleaning up. So past a certain number of fingerprints it's impossible to ever empty the queue.

The goroutine runs until it's done or the server restarts. I agree a task would be more appropriate, but this isn't a permanent solution. Once batch submission support is added to stash-box, we can submit all of them at once without being worried about timeouts, and then we can remove the goroutine.

Here's the relevant PR: stashapp/stash-box#1009

@feederbox826
Copy link
Copy Markdown
Member

Are we storing created_at just for ordering? don't really see the point of it if it's just for order ASC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants