Skip to content

AArch64: add acq/rel load ordering to CAS instructions#9076

Open
liamwhite wants to merge 1 commit intoNationalSecurityAgency:masterfrom
liamwhite:sleigh-aarch64-cas-acqrel
Open

AArch64: add acq/rel load ordering to CAS instructions#9076
liamwhite wants to merge 1 commit intoNationalSecurityAgency:masterfrom
liamwhite:sleigh-aarch64-cas-acqrel

Conversation

@liamwhite
Copy link
Copy Markdown

For this function:

ordered_cas_test:
	casal	x1, x2, [x0]
	ret

The load ordering pcode ops used by other LSE instructions aren't emitted.

void ordered_cas_test(long *param_1,long param_2,long param_3)
{
  if (*param_1 == param_2) {
    *param_1 = param_3;
  }
  return;
}

For CAS, the acquire op always occurs if specified, and a specified release op only occurs on successful store. The output after applying this PR looks like this:

void ordered_cas_test(long *param_1,long param_2,long param_3)
{
  LOAcquire();
  if (*param_1 == param_2) {
    *param_1 = param_3;
    LORelease();
  }
  return;
}

@GhidorahRex
Copy link
Copy Markdown
Collaborator

I'm not very familiar with the acquire release semantics, so I'm getting myself brought up to speed here. But from the manual (https://developer.arm.com/documentation/ddi0602/2025-12/Base-Instructions/CAS--CASA--CASAL--CASL--Compare-and-swap-word-or-doubleword-in-memory-) it looks like the acquire should only be generated if aa_Wt is not 31?

if !IsFeatureImplemented(FEAT_LSE) then EndOfDecode(Decode_UNDEF); end;
let s : integer{} = UInt(Rs);
let t : integer{} = UInt(Rt);
let n : integer{} = UInt(Rn);
let datasize : integer{} = 8 << UInt(size);
let regsize : integer{} = if datasize == 64 then 64 else 32;
let acquire : boolean = L == '1' && t != 31;  <--- here
let release : boolean = o0 == '1';
let tagchecked : boolean = n != 31;

Rather than having both cas_loa and cas_lor and cas_var we can consolidate them and remove cas_var:

cas_loa: "a" is b_22=1 &  aa_Wt { LOAcquire(); }
cas_loa: "a" is b_22=1 &  aa_Wt=31 { } # Assuming my note above with Wt != 31 is correct
cas_loa: "" is b_22=0 & b_15=0 { }

cas_lor: "" is b_15=0 { }
cas_lor: "l" is b_15=1 { LORelease(); }

...

:cas^cas_loa^cas_lor^"b" aa_Ws, aa_Wt, [Rn_GPR64xsp]
is b_3031=0b00 & b_2329=0b0010001 & b_21=1 & b_1014=0b11111 & aa_Wt & Rn_GPR64xsp & aa_Ws & cas_loa & cas_lor
{
	comparevalue:1 = aa_Ws:1;
	newvalue:1 = aa_Wt:1;
	build cas_loa;
	data:1 = *:1 Rn_GPR64xsp;
	if (data != comparevalue) goto <skip>;
	*:1 Rn_GPR64xsp = newvalue;
	build cas_lor;
<skip>
	aa_Ws = zext(data);
}

etc.

@liamwhite
Copy link
Copy Markdown
Author

Good call on the zr case - yes, if the result of the load is discarded (placed into zr) then the acquire op does not occur architecturally

@GhidorahRex
Copy link
Copy Markdown
Collaborator

Nothing is ever written to Wt here, so the issue isn't the value being discarded. Wt represents the new value to be stored. No acquire is generated if the zero register is the value potentially stored - because the store never depends on the value of previous memory accesses since the value is always 0? Or is it a hint to the processor that the value in memory is essentially being discarded?

@liamwhite
Copy link
Copy Markdown
Author

Okay, I misread and you are correct. That does seem a bit odd. It doesn't really make any sense to me that they would define it this way because the useful property is that if the load result can't be used, then there are no observable side effects if the acquire is not performed. But the load result is Rs, and the acquire op is for Rt != 31, so...

@liamwhite
Copy link
Copy Markdown
Author

I looked into it and it's actually just common decode for all LSE instructions. The acquire is architecturally relaxed when Rt=31 (and for the other LSE instructions, this generally makes because Rt is the load result).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants