feat: add adjustable start and end for analysis window#346
feat: add adjustable start and end for analysis window#346
Conversation
|
@nathanielrindlaub took a stab at a very quick implementation of the start and end date fine-tuning. Let me know if this is along the lines of what you are thinking. |
|
closes #215 |
|
@nathanielrindlaub Updated this to use DB queries instead of looking through objects in memory. Anecdotally, it does seem a little bit more performant and the code is much nicer. We do tell the user that we're adjusting the date but the output file and other console logs will still use the original date. I wasn't sure if we should use the adjusted window or if the user would still want to see the dates they set in their original config |
feat: add ajustable window to sequence level eval
| ANALYSIS_DIR: '/analysis', | ||
| PROJECT_ID: 'sci_biosecurity', | ||
| START_DATE: '2023-4-28', | ||
| START_DATE: '2023-2-28', |
There was a problem hiding this comment.
This was just so I could check if it actually moved the dates or not. Will revert
fix: revert change to start date
| $match: { | ||
| projectId: PROJECT_ID, | ||
| dateAdded: { | ||
| $gte: new Date(START_DATE), |
There was a problem hiding this comment.
Thoughts after talking today -- should we maybe adjust this to add padding in either direction? I.e. +/- a week of the start date / end date or something?
There was a problem hiding this comment.
Yeah honestly I think ideally we wouldn't constrict the query by dateAdded at all so as to look the model usage across the entire life of the Project.
| $match: { | ||
| projectId: PROJECT_ID, | ||
| dateAdded: { | ||
| $gte: new Date(START_DATE), |
There was a problem hiding this comment.
Yeah honestly I think ideally we wouldn't constrict the query by dateAdded at all so as to look the model usage across the entire life of the Project.
| } | ||
| }, | ||
| }, | ||
| { $sort: { dateAdded: 1 } } |
There was a problem hiding this comment.
I'm not sure if this would help at all, but instead of pulling all of the images into memory, you could do 2 separate queries, one with {$sort: { dateAdded: 1 } and one with {$sort: { dateAdded: -1 }, and add a {$limit: 1 } to the aggregations so you only get the oldest matching image and the newest, respectively. Again not sure if that would be any more performative but would be a bit less taxing on memory I suppose.
There was a problem hiding this comment.
I'll try this out. I wasn't sure how mongodb internally handles these pipelines so I was concerned that it would result in it redoing a heavy query (the first pipeline stage).
| throw new Error('unable to find a valid first and last image in automation window.'); | ||
| } | ||
|
|
||
| const newStart = firstMlLabelAfterStart.dateAdded > (new Date(START_DATE)) |
There was a problem hiding this comment.
We should probably +1 day to the start and -1 day to the end to make sure we're not including any partial days.
| PROJECT_ID: 'sci_biosecurity', | ||
| START_DATE: '2023-4-28', | ||
| END_DATE: '2024-5-29', | ||
| ADJUSTABLE_WINDOW: true, |
There was a problem hiding this comment.
I would maybe call this something that's a little more descriptive like AUTO_ADJUST_TIME_WINDOW. Or just add a comment explaining what it does.
| } | ||
| } | ||
|
|
||
| async function tryAdjustAutomationWindow() { |
There was a problem hiding this comment.
This seems solid for a first pass at a solution, and I don't want to over think it, but there are circumstances when users use model A for some period of time then try out model B then decide A was a better option so the switch back to A. This solution doesn't account for that of course but I think that's ok for now.
docs: add info about configuration option
Context
Adjusts the given start and end date to avoid analyzing images that are from before or after the deployment window.