Skip to content

feat: build temporary validation lists#350

Open
lessej wants to merge 12 commits intomainfrom
feature/gen-validation-lists
Open

feat: build temporary validation lists#350
lessej wants to merge 12 commits intomainfrom
feature/gen-validation-lists

Conversation

@lessej
Copy link
Copy Markdown
Collaborator

@lessej lessej commented Sep 1, 2025

Context

Attempts to build a validation list for ml labels. Takes an ML label, finds objects with that label, and builds a set of other labels on the objects. Some things to think about:

  • Is there a way to make it more efficient? It's O(n^2) at least right now. Can some of the in-memory operations be offloaded to the DB query?
  • What to do with labels that a user has added but then removed -- it seems like these are still on the object

@lessej lessej self-assigned this Sep 1, 2025
Comment thread src/scripts/generateValidating.js
Comment thread src/scripts/generateValidating.js Outdated
'objects.labels.labelId': {
$in: projectLabelIds
},
'objects.labels.validation.validated': true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lessej what's the thinking with this line ^?

Comment thread src/scripts/generateValidating.js Outdated
},
},
{
$set: {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me what you're trying to do here with there $seting of the Image.filteredLabels field?

Comment thread src/scripts/generateValidating.js Outdated
}, {});
const projectLabelIds = Object.keys(projectLabels);

// Prepare map { labelId: [validatingLabelId] }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map should match the shape of the existing TARGET_CLASS array in the analysisConfig.js`, right? See example entry copied into the comment above.

Comment thread src/scripts/generateValidating.js Outdated
const labelIds = obj.filteredLabels.map((lbl) => lbl.labelId);
const firstValidLabelId = obj.firstValidLabel.shift().labelId;
const curr = validatingLabels[firstValidLabelId];
validatingLabels[firstValidLabelId] = new Set([...curr, ...labelIds]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a little trouble understanding this. Seems a little odd to use a reducer here. You're both dropping in validatingLabels as an initial value (132) then referencing it and manipulating the out-of-scope variable from within the reducer loop? But then returning the imageAcc after each run of the reducer function (131)? Then I kind of assume is what the reduce() returns and assigns imageAcc to validationLists once it's iterated over all the images? Maybe I'm not tracking or something but I think this could be accomplished with a simple forEach loop:

const validatingLabels = projectLabelIds.reduce((acc, lbl) => {
      return { ...acc, [lbl]: [] };
    }, {});

images.forEach((img) => {
      img.objects.forEach((obj) => {
        ...
         validatingLabels[firstValidLabelId] = new Set([...curr, ...labelIds]);
      }
}

Comment thread src/scripts/generateValidating.js Outdated

const labelIds = obj.filteredLabels.map((lbl) => lbl.labelId);
const firstValidLabelId = obj.firstValidLabel.shift().labelId;
const curr = validatingLabels[firstValidLabelId];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be backwards. The firstValidLabel should be added to the array of validating labels, not used as the predicted label.

Copy link
Copy Markdown
Member

@nathanielrindlaub nathanielrindlaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small update and I think this is good to go!

Comment thread src/scripts/generateValidating.js Outdated
for await (const img of Image.aggregate(baseImagePipeline)) {
for (const obj of img.objects) {
const firstValidated = obj.firstValidLabel.shift();
const predicted = obj.labels.find((lbl) => lbl.mlModel === MODEL);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store all predicted labels above whatever the confidence threshold was set for that particular category, so we need to account for situations in which the model has returned two labels above the confidence threshold (i.e., more than one predicted label), rather than just grabbing the first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants