Skip to content

feat: add adjustable start and end for analysis window#346

Open
lessej wants to merge 10 commits intomainfrom
feature/adjustable-start-end-window-for-analysis
Open

feat: add adjustable start and end for analysis window#346
lessej wants to merge 10 commits intomainfrom
feature/adjustable-start-end-window-for-analysis

Conversation

@lessej
Copy link
Copy Markdown
Collaborator

@lessej lessej commented Aug 15, 2025

Context

Adjusts the given start and end date to avoid analyzing images that are from before or after the deployment window.

@lessej
Copy link
Copy Markdown
Collaborator Author

lessej commented Aug 15, 2025

@nathanielrindlaub took a stab at a very quick implementation of the start and end date fine-tuning. Let me know if this is along the lines of what you are thinking.

@nathanielrindlaub
Copy link
Copy Markdown
Member

closes #215

@lessej
Copy link
Copy Markdown
Collaborator Author

lessej commented Aug 27, 2025

@nathanielrindlaub Updated this to use DB queries instead of looking through objects in memory. Anecdotally, it does seem a little bit more performant and the code is much nicer.

We do tell the user that we're adjusting the date but the output file and other console logs will still use the original date. I wasn't sure if we should use the adjusted window or if the user would still want to see the dates they set in their original config

feat: add ajustable window to sequence level eval
Comment thread src/scripts/analysisConfig.js Outdated
ANALYSIS_DIR: '/analysis',
PROJECT_ID: 'sci_biosecurity',
START_DATE: '2023-4-28',
START_DATE: '2023-2-28',
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just so I could check if it actually moved the dates or not. Will revert

fix: revert change to start date
Comment thread src/scripts/analyzeMLObjectLevel.js Outdated
$match: {
projectId: PROJECT_ID,
dateAdded: {
$gte: new Date(START_DATE),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts after talking today -- should we maybe adjust this to add padding in either direction? I.e. +/- a week of the start date / end date or something?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah honestly I think ideally we wouldn't constrict the query by dateAdded at all so as to look the model usage across the entire life of the Project.

Comment thread src/scripts/analyzeMLObjectLevel.js Outdated
$match: {
projectId: PROJECT_ID,
dateAdded: {
$gte: new Date(START_DATE),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah honestly I think ideally we wouldn't constrict the query by dateAdded at all so as to look the model usage across the entire life of the Project.

Comment thread src/scripts/analyzeMLObjectLevel.js Outdated
}
},
},
{ $sort: { dateAdded: 1 } }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this would help at all, but instead of pulling all of the images into memory, you could do 2 separate queries, one with {$sort: { dateAdded: 1 } and one with {$sort: { dateAdded: -1 }, and add a {$limit: 1 } to the aggregations so you only get the oldest matching image and the newest, respectively. Again not sure if that would be any more performative but would be a bit less taxing on memory I suppose.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try this out. I wasn't sure how mongodb internally handles these pipelines so I was concerned that it would result in it redoing a heavy query (the first pipeline stage).

Comment thread src/scripts/analyzeMLObjectLevel.js Outdated
throw new Error('unable to find a valid first and last image in automation window.');
}

const newStart = firstMlLabelAfterStart.dateAdded > (new Date(START_DATE))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably +1 day to the start and -1 day to the end to make sure we're not including any partial days.

Comment thread src/scripts/analysisConfig.js Outdated
PROJECT_ID: 'sci_biosecurity',
START_DATE: '2023-4-28',
END_DATE: '2024-5-29',
ADJUSTABLE_WINDOW: true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe call this something that's a little more descriptive like AUTO_ADJUST_TIME_WINDOW. Or just add a comment explaining what it does.

}
}

async function tryAdjustAutomationWindow() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems solid for a first pass at a solution, and I don't want to over think it, but there are circumstances when users use model A for some period of time then try out model B then decide A was a better option so the switch back to A. This solution doesn't account for that of course but I think that's ok for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants