Skip to content

/Search route and reconciliator() takes expontentially long with lots of candidates for reconciliation #1

Description

@paulhectork

issue description

what the title says. when a lot of candidates have been selected and need to be reconciliated, the reconciliation process takes exponentially long. this is a problem because reconciliation() is used in the search engine of the human readable website (in the /Search route). the problem seems to be in the double_loop() function. for example:

  • when searching for Sévigné, 52 items are matched for a reconciliation. double_loop() takes ~1 second and the waiting time for /Search is ~5 seconds long (which is fine).
  • when searching for Napoléon, 112 items are matched. reconciliation takes 4'' and the whole search takes rougly 7-10 seconds (a bit long, but still fine).
  • when searching for bonaparte, there are 740 matches. double_loop() takes 5'19'' to be processed and the whole time taken for /Search takes rougly as much time.

although this problem occurs rarely (I noticed it after working on the website for months), it is unsuitable for a client-exposed function, because the client will virtually never wait 5+ minutes for a response. it also causes a server-side problem, because the application will continue to run the search even after the client has quit the page. this could cause strain on the app if there are several pending requests to process.

technical problem

the /Search route calls reconciliator() to group different occurrences of the same manuscript together. this function works in two times:

  • first, it filters all the catalogue entries based on the client's query (an author with author_filtering() and a date with date_fitering()). it matches a certain number of items that need to be reconciliated. the number of items matched don't impact the processing time.
  • then, double_loop() is called to group matched items. this is what takes so much time.
    • in more detail, double_loop() loops over all matched items, and, for each of those items, loops once more over all matched items. so if x items are matched, the total number of iterations is x^2.

solutions

i've thought of two possible solutions:

  • continue with the reconciliation, but flash a message to the client saying that it will take a lot of time (which would require sending asynchronous data from client to server, so not that ease)
  • not reconciliate above a certain number of items matched. in this case, all matching catalogue items are shown, without grouping together items that represent the same manuscript. in this case, we should change the way search results are presented, since the HTML response always allow to view reconciliated manuscripts (see View by manuscript button).

to be continued...

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions