issue description
what the title says. when a lot of candidates have been selected and need to be reconciliated, the reconciliation process takes exponentially long. this is a problem because reconciliation() is used in the search engine of the human readable website (in the /Search route). the problem seems to be in the double_loop() function. for example:
- when searching for
Sévigné, 52 items are matched for a reconciliation. double_loop() takes ~1 second and the waiting time for /Search is ~5 seconds long (which is fine).
- when searching for
Napoléon, 112 items are matched. reconciliation takes 4'' and the whole search takes rougly 7-10 seconds (a bit long, but still fine).
- when searching for
bonaparte, there are 740 matches. double_loop() takes 5'19'' to be processed and the whole time taken for /Search takes rougly as much time.
although this problem occurs rarely (I noticed it after working on the website for months), it is unsuitable for a client-exposed function, because the client will virtually never wait 5+ minutes for a response. it also causes a server-side problem, because the application will continue to run the search even after the client has quit the page. this could cause strain on the app if there are several pending requests to process.
technical problem
the /Search route calls reconciliator() to group different occurrences of the same manuscript together. this function works in two times:
- first, it filters all the catalogue entries based on the client's query (an author with
author_filtering() and a date with date_fitering()). it matches a certain number of items that need to be reconciliated. the number of items matched don't impact the processing time.
- then,
double_loop() is called to group matched items. this is what takes so much time.
- in more detail,
double_loop() loops over all matched items, and, for each of those items, loops once more over all matched items. so if x items are matched, the total number of iterations is x^2.
solutions
i've thought of two possible solutions:
- continue with the reconciliation, but flash a message to the client saying that it will take a lot of time (which would require sending asynchronous data from client to server, so not that ease)
- not reconciliate above a certain number of items matched. in this case, all matching catalogue items are shown, without grouping together items that represent the same manuscript. in this case, we should change the way search results are presented, since the HTML response always allow to view reconciliated manuscripts (see
View by manuscript button).
to be continued...
issue description
what the title says. when a lot of candidates have been selected and need to be reconciliated, the reconciliation process takes exponentially long. this is a problem because
reconciliation()is used in the search engine of the human readable website (in the/Searchroute). the problem seems to be in thedouble_loop()function. for example:Sévigné, 52 items are matched for a reconciliation.double_loop()takes ~1 second and the waiting time for/Searchis ~5 seconds long (which is fine).Napoléon, 112 items are matched. reconciliation takes 4'' and the whole search takes rougly 7-10 seconds (a bit long, but still fine).bonaparte, there are 740 matches.double_loop()takes 5'19'' to be processed and the whole time taken for/Searchtakes rougly as much time.although this problem occurs rarely (I noticed it after working on the website for months), it is unsuitable for a client-exposed function, because the client will virtually never wait 5+ minutes for a response. it also causes a server-side problem, because the application will continue to run the search even after the client has quit the page. this could cause strain on the app if there are several pending requests to process.
technical problem
the
/Searchroute callsreconciliator()to group different occurrences of the same manuscript together. this function works in two times:author_filtering()and a date withdate_fitering()). it matches a certain number of items that need to be reconciliated. the number of items matched don't impact the processing time.double_loop()is called to group matched items. this is what takes so much time.double_loop()loops over all matched items, and, for each of those items, loops once more over all matched items. so ifxitems are matched, the total number of iterations isx^2.solutions
i've thought of two possible solutions:
View by manuscriptbutton).to be continued...