Skip to content

jmp_match: account for self-shrink on forward jcc/jmp rel8 check#239

Open
bboe wants to merge 2 commits into
netwide-assembler:masterfrom
bboe:jmp-match-forward-shrink
Open

jmp_match: account for self-shrink on forward jcc/jmp rel8 check#239
bboe wants to merge 2 commits into
netwide-assembler:masterfrom
bboe:jmp-match-forward-shrink

Conversation

@bboe

@bboe bboe commented May 18, 2026

Copy link
Copy Markdown

Forward jcc/jmp self-shrink case in jmp_match()'s borderline rel8 fit check was rejecting valid shrinks because it compared against the current-layout displacement instead of the post-shrink one. Regression test: travis/test/jccshrink.

The borderline rel8 fit test compared the current-layout short
displacement (target - offset - short_size). For a forward jump that
the previous pass emitted as NEAR, shrinking to SHORT moves the target
near_size - short_size bytes closer, so the post-shrink rel8 is that
many bytes smaller. NASM was keeping forward near jcc/jmp instances
that a self-shrink-aware iteration emits as short (a 4-byte shrink for
jcc32, 3-byte for jmp32).

Track per-jump previous-pass encoding in a static side table, swapped
at pass boundaries via _passn. In the borderline branch, when the
current-layout check rejects AND the previous pass was NEAR AND the
jump is forward, re-check against the post-shrink displacement. The
previous-pass guard prevents over-shrinking already-short jumps whose
displacement grew past 127; the forward guard skips backward jumps,
whose target doesn't move when the jump shrinks.

Add travis/test/jccshrink as a regression test: stock NASM emits 605 B
with a 5-byte near jmp at 0x1d9; patched emits 602 B with eb 7f
(rel8 = 127). make travis passes (301/301). Byte-identical on
optimization.asm, jmp64.asm, riprel.asm, a32offs.asm.

Signed-off-by: Bryce Boe <bbzbryce@gmail.com>
@bboe

bboe commented May 24, 2026

Copy link
Copy Markdown
Author

Bump.

@hpax

hpax commented Jun 8, 2026

Copy link
Copy Markdown
Member

Hi,

This is certainly a valid consideration, but I am a bit concerned that it is overly complex, certainly for 3.02 which is currently in late -rc.

For one thing, it causes a convergence failure if there is an "align 2" statement after "jmp B" in your test, which is rather obviously a far worse regression.

To some degree, this is a consequence of a general issue with relaxations in NASM: because of the extremely tight coupling between the preprocessor and the assembler, it is possible to create arbitrarily complex addressing relationships, and unfortunately people do depend on those in real life code.

Furthermore, at least ideally, this should apply to all relaxable statements, including immediates and displacements when applicable. This adds further to the complexity of the problem.

First of all, let me confirm that this is a real problem, and I appreciate the work at resolving it. Unfortunately this patch by itself is not adequate. I would be happy to have a further discussions about what kind of infrastructure changes would be needed in NASM to support more aggressive optimizations.

An "align" between a jump and its target breaks the self-shrink
re-check's rigid-span assumption, so the jump oscillates NEAR<->SHORT
and assembly stalls. Latch each jump to speculate at most once, then
fall back to the convergent current-layout test.

Add travis/test/jccshrinkalign.

Signed-off-by: Bryce Boe <bbzbryce@gmail.com>
@bboe

bboe commented Jun 8, 2026

Copy link
Copy Markdown
Author

Thanks for taking the time to review and the pointer about the alignment regression.

I made a follow up commit that should address that specific case, however, it certainly doesn't handle the other cases you've mentioned.

I'll take some time over the next day or so to dig up the real case I experienced and see if my assembler has the same alignment issue. I'll also try to come up with some test cases for the other cases as well. If you happen to know of some, that would save a bit of time. Thanks!

@bboe

bboe commented Jun 8, 2026

Copy link
Copy Markdown
Author

FYI, this gist shows the assembly that prompted this investigation and pull request. The assembly there was generated by my compiler.

@hpax

hpax commented Jun 11, 2026

Copy link
Copy Markdown
Member

I really wish I knew a way to create a general solution for this! It is pretty clear that it is not even particularly hard for the user to create situations where there simply isn't any way for NASM's "code-blind" algorithm to converge, but the preprocessor coupling -- a very early NASM design decision and pretty fundamental to its operation, not something that we can break -- means that it isn't even necessarily possible for NASM to know what the "same instruction" necessarily means.

One example that one could actually imagine in real life; note that in this case the only possible solution involves a 6-byte jump with an offset of 0x7e, which would have fit in 2 bytes, but when encoded in 2 bytes triggers the inlining and pushes the "altcode" target out of range.

       bits 64
        default rel

        section .text

%macro mycode 0
        mov [pizza],eax
        mov [pasta],edx
%endmacro

do_mycode:
        test eax,eax
        je altcode
%if ($-do_mycode) < 6
        ;; Space enough to inline mycode
        mycode
%else
        ;; mycode is too large, needs to be out of line
        section .text.mycode
__mycode:
        mycode
        ret
        section .text
        call __mycode
%endif

        times 120 nop           ; Some other code
        ret

altcode:
        mov dword [pizza],0xfeedface
        mov dword [pasta],0xdeadbeef
        ret

%ifdef need_mycode
__mycode:
        mycode
        ret
%endif

        section .bss
pizza   dd ?
pasta   dd ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants