Skip to content

IMP: Swap rarefy random_seed to CaptureHolder#352

Open
Oddant1 wants to merge 10 commits intoqiime2:devfrom
Oddant1:track-value-of-random-seed
Open

IMP: Swap rarefy random_seed to CaptureHolder#352
Oddant1 wants to merge 10 commits intoqiime2:devfrom
Oddant1:track-value-of-random-seed

Conversation

@Oddant1
Copy link
Member

@Oddant1 Oddant1 commented Jan 28, 2026

Use CaptureHolder to track value of random_seed in provenance.

@github-project-automation github-project-automation bot moved this to Backlog in 2026.4 🌱 Jan 28, 2026
@Oddant1 Oddant1 moved this from Backlog to In Review in 2026.4 🌱 Jan 28, 2026
@Oddant1 Oddant1 requested a review from ebolyen January 28, 2026 23:16
Copy link
Member

@ebolyen ebolyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a piece to make this scale across plugins effectively. I've got a few ideas below. Let me know what you think.

['O1', 'O2'],
['S1', 'S2', 'S3'])
a = rarefy(t, 2)
a = rarefy(t, 2, random_seed=FakeCaptureHolder())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the number of these needed might suggest something is off.

Should we consider that we either need to test the Plugin side of things, so that random_seed is initialized, or should we alter rarefy such that None is also fine (which is what the type technically says...). I'm leaning towards the latter, since Rachis would be obeying the signature by always providing a capture-holder, and it makes it easier to work with for the tests.

We could also imagine a test utility/decorator which injects any Rachis objects into the function, but I don't think that makes the tests any easier to write or think about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I found it annoying to change them all. No, I don't think we should alter rarefy such that None is actually a valid argument to it. We want a CaptureHolder with None not an actual None. I don't think nuisances arising from calling the functions behind actions without the framework should be resolved by modifying the actions because we kind of fundamentally don't respect calling the function in this way in a real world context.

Comment on lines +25 to +27
if random_seed.value is None:
random_int = random.randrange(RNG_MAX_SIZE)
random_seed.set_value(random_int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the above comment about the number of instances we need for a test, what if we imagined something like this:

Suggested change
if random_seed.value is None:
random_int = random.randrange(RNG_MAX_SIZE)
random_seed.set_value(random_int)
if random_seed is None or random_seed.value is None:
random_int = random.randrange(RNG_MAX_SIZE)
else:
random_int = random_seed.value
if random_seed and random_seed.value is None:
random_seed.set_value(random_int)

This also ends up being kind of bad...

So what about this:

Suggested change
if random_seed.value is None:
random_int = random.randrange(RNG_MAX_SIZE)
random_seed.set_value(random_int)
random_seed = CaptureHolder.getorset(random_seed, random.randrange(RNG_MAX_SIZE))

Where if random_seed is None, it will construct one, but if it was provided an CaptureHolder, it inspects the state to determine if it was set already by the outer _bind, and if not, will instantiate itself (untethered to a provenance chain) with the secondary argument as the value.

Alternatively the result could just be the value instead of an instance, which might be easier on everybody.

Suggested change
if random_seed.value is None:
random_int = random.randrange(RNG_MAX_SIZE)
random_seed.set_value(random_int)
random_int = CaptureHolder.getorset(random_seed, random.randrange(RNG_MAX_SIZE))

table = table.subsample(sampling_depth, axis='sample', by_id=False,
with_replacement=with_replacement,
seed=random_seed)
seed=random_seed.value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
seed=random_seed.value)
seed=random_int)

only if paired with the above

# ----------------------------------------------------------------------------


class FakeCaptureHolder:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this would be needed in a lot of places in practice, so it would probably make sense to put it in rachis.plugin.testing as a Mock, but I think we might also be able to avoid it if we are a little clever.

@Oddant1
Copy link
Member Author

Oddant1 commented Feb 24, 2026

blocked by rachis-org/rachis#913

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Development

Development

Successfully merging this pull request may close these issues.

2 participants