i#7889: Zero upper vector on gather to lower bits#7890
i#7889: Zero upper vector on gather to lower bits#7890abhinav92003 wants to merge 5 commits intomasterfrom
Conversation
Invokes the logic to zero extend the upper part of the ymm/zmm vector (depending on whether the machine supports AVX2 or AVX512), when only the lower part of it that maps to xmm or ymm is gathered into. Extends the allasm_scattergather_x86.asm test to verify that the higher part of the vector gathered-into is indeed zero. This works fine when not under DynamoRIO which verifies our assumptions. Fixes: #7889
| /* E.g., vpgatherqd/vgatherqps xmm0 {k1}, [xax + ymm1 * 4] | ||
| * where the dest reg is inflated to YMM size by the DR decoder, by design. | ||
| } else if (processed_bytes == XMM_REG_SIZE) { | ||
| /* This also handles partial gathers into inflated ymm (i#7887) such as: |
There was a problem hiding this comment.
@derekbruening I can see just removing these comments showing a "partial gather" example. But it may still be good to highlight that even though DR sees this as a ymm dest, the non-xmm bits need to be zeroed.
There was a problem hiding this comment.
I would reword to say something like "The DR decoder (incorrectly) inflates some xmm destinations into ymm (maybe put asm example here), making them appear as though they are partial gathers; regardless, they need zeroing and will come here."
The comment as written is misleading IMHO: it should put partial in quotes because they are not really partial; I would rewrite more toward what I suggested.
|
FYI, Github is having problems: it gets internal errors trying to save review comments. Will try later. |
| ASSERT(processed_bytes * 2 == opnd_size_in_bytes(sg_info->scatter_gather_size), | ||
| "Partial gather must statically fill exactly half of the destination " | ||
| "register."); | ||
| if (processed_bytes < opnd_size_in_bytes(sg_info->scatter_gather_size)) { |
There was a problem hiding this comment.
So if we fix the DR decoder to not inflate the destination: the old code would have failed to zero (would have exited early), but now it will zero: right? The code is now agnostic to whether the DR decoder inflates? It might come into this if() with inflation and not without, but that's the only difference?
| /* E.g., vpgatherqd/vgatherqps xmm0 {k1}, [xax + ymm1 * 4] | ||
| * where the dest reg is inflated to YMM size by the DR decoder, by design. | ||
| } else if (processed_bytes == XMM_REG_SIZE) { | ||
| /* This also handles partial gathers into inflated ymm (i#7887) such as: |
There was a problem hiding this comment.
I would reword to say something like "The DR decoder (incorrectly) inflates some xmm destinations into ymm (maybe put asm example here), making them appear as though they are partial gathers; regardless, they need zeroing and will come here."
The comment as written is misleading IMHO: it should put partial in quotes because they are not really partial; I would rewrite more toward what I suggested.
| jne incorrect_zeroing_mask0 | ||
| vpextrd eax, xmm4, 2 | ||
| cmp eax, 0xffffffff | ||
| jne incorrect_zeroing_mask0 |
There was a problem hiding this comment.
Is this checking all the bits? I see two checks of eax==32 bits so that's only half of the xmm bits right?
| jne incorrect_zeroing_mask0 | ||
| vpextrd eax, xmm4, 2 | ||
| cmp eax, 0xffffffff | ||
| jne incorrect_zeroing_mask0 |
There was a problem hiding this comment.
Checking for all 1's should be doable just like lines 234-236 check for all 0's with an added AND operation, and on xmm instead of ymm, right? Make a mask of all 1's, AND it with the target register, and check that the result is all zeroes. Should be just 3 instructions, instead of 8 instructions here.
| } else if (processed_bytes == 32) { | ||
| /* E.g., vpgatherqd/vgatherqps ymm0 {k1}, [xax + zmm1 * 4] | ||
| } else if (processed_bytes == YMM_REG_SIZE) { | ||
| /* This also handles partial gathers into inflated zmm (i#7887) such as: |
There was a problem hiding this comment.
See above comment: these are not really partial. Reword to "seemingly-partial" or completely rewrite or just point to prior comment to avoid repeating all of it.
Invokes the logic to zero the upper bits of the ymm/zmm vector (depending on whether the machine supports AVX2 or AVX512), when only the lower bits of it that map to xmm or ymm are gathered into.
#7883 added the zeroing logic already but invoked it only for partial gathers where there's a mismatch between the index and data element sizes. Here we invoke it for all cases where the vector gathered into is smaller than the full length supported by the machine.
Extends the allasm_scattergather_x86.asm test to verify that the upper part of the vector gathered-into is indeed zeroed. This works fine when not under DynamoRIO which verifies our assumptions.
Verified on a machine with AVX-512 that the tests pass:
Also verified that each of the new sub-tests fail without the fix.
Fixes: #7889