jmp_match: account for self-shrink on forward jcc/jmp rel8 check#239
jmp_match: account for self-shrink on forward jcc/jmp rel8 check#239bboe wants to merge 2 commits into
Conversation
The borderline rel8 fit test compared the current-layout short displacement (target - offset - short_size). For a forward jump that the previous pass emitted as NEAR, shrinking to SHORT moves the target near_size - short_size bytes closer, so the post-shrink rel8 is that many bytes smaller. NASM was keeping forward near jcc/jmp instances that a self-shrink-aware iteration emits as short (a 4-byte shrink for jcc32, 3-byte for jmp32). Track per-jump previous-pass encoding in a static side table, swapped at pass boundaries via _passn. In the borderline branch, when the current-layout check rejects AND the previous pass was NEAR AND the jump is forward, re-check against the post-shrink displacement. The previous-pass guard prevents over-shrinking already-short jumps whose displacement grew past 127; the forward guard skips backward jumps, whose target doesn't move when the jump shrinks. Add travis/test/jccshrink as a regression test: stock NASM emits 605 B with a 5-byte near jmp at 0x1d9; patched emits 602 B with eb 7f (rel8 = 127). make travis passes (301/301). Byte-identical on optimization.asm, jmp64.asm, riprel.asm, a32offs.asm. Signed-off-by: Bryce Boe <bbzbryce@gmail.com>
|
Bump. |
|
Hi, This is certainly a valid consideration, but I am a bit concerned that it is overly complex, certainly for 3.02 which is currently in late -rc. For one thing, it causes a convergence failure if there is an "align 2" statement after "jmp B" in your test, which is rather obviously a far worse regression. To some degree, this is a consequence of a general issue with relaxations in NASM: because of the extremely tight coupling between the preprocessor and the assembler, it is possible to create arbitrarily complex addressing relationships, and unfortunately people do depend on those in real life code. Furthermore, at least ideally, this should apply to all relaxable statements, including immediates and displacements when applicable. This adds further to the complexity of the problem. First of all, let me confirm that this is a real problem, and I appreciate the work at resolving it. Unfortunately this patch by itself is not adequate. I would be happy to have a further discussions about what kind of infrastructure changes would be needed in NASM to support more aggressive optimizations. |
An "align" between a jump and its target breaks the self-shrink re-check's rigid-span assumption, so the jump oscillates NEAR<->SHORT and assembly stalls. Latch each jump to speculate at most once, then fall back to the convergent current-layout test. Add travis/test/jccshrinkalign. Signed-off-by: Bryce Boe <bbzbryce@gmail.com>
|
Thanks for taking the time to review and the pointer about the alignment regression. I made a follow up commit that should address that specific case, however, it certainly doesn't handle the other cases you've mentioned. I'll take some time over the next day or so to dig up the real case I experienced and see if my assembler has the same alignment issue. I'll also try to come up with some test cases for the other cases as well. If you happen to know of some, that would save a bit of time. Thanks! |
|
FYI, this gist shows the assembly that prompted this investigation and pull request. The assembly there was generated by my compiler. |
|
I really wish I knew a way to create a general solution for this! It is pretty clear that it is not even particularly hard for the user to create situations where there simply isn't any way for NASM's "code-blind" algorithm to converge, but the preprocessor coupling -- a very early NASM design decision and pretty fundamental to its operation, not something that we can break -- means that it isn't even necessarily possible for NASM to know what the "same instruction" necessarily means. One example that one could actually imagine in real life; note that in this case the only possible solution involves a 6-byte jump with an offset of 0x7e, which would have fit in 2 bytes, but when encoded in 2 bytes triggers the inlining and pushes the "altcode" target out of range. |
Forward jcc/jmp self-shrink case in
jmp_match()'s borderline rel8 fit check was rejecting valid shrinks because it compared against the current-layout displacement instead of the post-shrink one. Regression test:travis/test/jccshrink.