Skip to content

i#7889: Zero upper vector on gather to lower bits#7890

Open
abhinav92003 wants to merge 5 commits intomasterfrom
i7889-gather-zero-extend
Open

i#7889: Zero upper vector on gather to lower bits#7890
abhinav92003 wants to merge 5 commits intomasterfrom
i7889-gather-zero-extend

Conversation

@abhinav92003
Copy link
Copy Markdown
Contributor

@abhinav92003 abhinav92003 commented May 6, 2026

Invokes the logic to zero the upper bits of the ymm/zmm vector (depending on whether the machine supports AVX2 or AVX512), when only the lower bits of it that map to xmm or ymm are gathered into.

#7883 added the zeroing logic already but invoked it only for partial gathers where there's a mismatch between the index and data element sizes. Here we invoke it for all cases where the vector gathered into is smaller than the full length supported by the machine.

Extends the allasm_scattergather_x86.asm test to verify that the upper part of the vector gathered-into is indeed zeroed. This works fine when not under DynamoRIO which verifies our assumptions.

Verified on a machine with AVX-512 that the tests pass:

The following tests passed:
        code_api|client.drx-scattergather
        code_api|client.drx-scattergather-bbdup
        code_api|sample.memval_simple_scattergather
        code_api|tool.drcachesim.scattergather-x86
        code_api|tool.drcacheoff.allasm-scattergather-basic-counts
        code_api|tool.drcachesim.allasm-scattergather-basic-counts

Also verified that each of the new sub-tests fail without the fix.

Fixes: #7889

Invokes the logic to zero extend the upper part of the ymm/zmm vector (depending on whether the machine supports AVX2 or AVX512), when only the lower part of it that maps to xmm or ymm is gathered into.

Extends the allasm_scattergather_x86.asm test to verify that the higher part of the vector gathered-into is indeed zero. This works fine when not under DynamoRIO which verifies our assumptions.

Fixes: #7889
@abhinav92003 abhinav92003 changed the title i#7889: Zero extend full vec on gather to sub-vec i#7889: Zero upper vector on gather to lower bits May 6, 2026
@abhinav92003 abhinav92003 requested a review from derekbruening May 6, 2026 03:56
/* E.g., vpgatherqd/vgatherqps xmm0 {k1}, [xax + ymm1 * 4]
* where the dest reg is inflated to YMM size by the DR decoder, by design.
} else if (processed_bytes == XMM_REG_SIZE) {
/* This also handles partial gathers into inflated ymm (i#7887) such as:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekbruening I can see just removing these comments showing a "partial gather" example. But it may still be good to highlight that even though DR sees this as a ymm dest, the non-xmm bits need to be zeroed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reword to say something like "The DR decoder (incorrectly) inflates some xmm destinations into ymm (maybe put asm example here), making them appear as though they are partial gathers; regardless, they need zeroing and will come here."

The comment as written is misleading IMHO: it should put partial in quotes because they are not really partial; I would rewrite more toward what I suggested.

@derekbruening
Copy link
Copy Markdown
Contributor

FYI, Github is having problems: it gets internal errors trying to save review comments. Will try later.

ASSERT(processed_bytes * 2 == opnd_size_in_bytes(sg_info->scatter_gather_size),
"Partial gather must statically fill exactly half of the destination "
"register.");
if (processed_bytes < opnd_size_in_bytes(sg_info->scatter_gather_size)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we fix the DR decoder to not inflate the destination: the old code would have failed to zero (would have exited early), but now it will zero: right? The code is now agnostic to whether the DR decoder inflates? It might come into this if() with inflation and not without, but that's the only difference?

/* E.g., vpgatherqd/vgatherqps xmm0 {k1}, [xax + ymm1 * 4]
* where the dest reg is inflated to YMM size by the DR decoder, by design.
} else if (processed_bytes == XMM_REG_SIZE) {
/* This also handles partial gathers into inflated ymm (i#7887) such as:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reword to say something like "The DR decoder (incorrectly) inflates some xmm destinations into ymm (maybe put asm example here), making them appear as though they are partial gathers; regardless, they need zeroing and will come here."

The comment as written is misleading IMHO: it should put partial in quotes because they are not really partial; I would rewrite more toward what I suggested.

jne incorrect_zeroing_mask0
vpextrd eax, xmm4, 2
cmp eax, 0xffffffff
jne incorrect_zeroing_mask0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this checking all the bits? I see two checks of eax==32 bits so that's only half of the xmm bits right?

jne incorrect_zeroing_mask0
vpextrd eax, xmm4, 2
cmp eax, 0xffffffff
jne incorrect_zeroing_mask0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for all 1's should be doable just like lines 234-236 check for all 0's with an added AND operation, and on xmm instead of ymm, right? Make a mask of all 1's, AND it with the target register, and check that the result is all zeroes. Should be just 3 instructions, instead of 8 instructions here.

} else if (processed_bytes == 32) {
/* E.g., vpgatherqd/vgatherqps ymm0 {k1}, [xax + zmm1 * 4]
} else if (processed_bytes == YMM_REG_SIZE) {
/* This also handles partial gathers into inflated zmm (i#7887) such as:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment: these are not really partial. Reword to "seemingly-partial" or completely rewrite or just point to prior comment to avoid repeating all of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

drx x86 gather expansion does not zero upper ymm/zmm bits

2 participants