Skip to content

Enable AVX512VL + AVX512DQ#5694

Merged
solardiz merged 1 commit intobleeding-jumbofrom
unknown repository
Mar 12, 2025
Merged

Enable AVX512VL + AVX512DQ#5694
solardiz merged 1 commit intobleeding-jumbofrom
unknown repository

Conversation

@ghost
Copy link

@ghost ghost commented Mar 10, 2025

Let's hear bots.

I'll remove runstatedir after testing.


checking for AVX2... yes
checking for AVX512BW + AVX512VL + AVX512DQ... yes
checking if gcc supports -maes -mpclmul... yes

OR:

checking for AVX2... yes
checking for AVX512BW + AVX512VL + AVX512DQ... no
checking for AVX512F... no
checking if gcc supports -maes -mpclmul... yes
Target CPU ......................................... x86_64 AVX512BW, 64-bit LE
Target OS .......................................... linux-gnu
Version: 1.9.0-jumbo-1+bleeding-60f3614a06 2025-03-10 07:27:48 -0300
Build: linux-gnu 64-bit x86_64 AVX512(BW+VL+DQ) AC OMP
SIMD: AVX512BW, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
AES hardware acceleration: AES-NI
CPU tests: AVX512(BW+VL+DQ)
$JOHN is ../run/

Sorry, AVX512(BW+VL+DQ) is required for this build


John the Ripper 1.9.0-jumbo-1+bleeding-7146e4c827 2025-03-12 05:14:17 +0100 OMP [linux-gnu 64-bit x86_64 AVX2 AC]
Copyright (c) 1996-2025 by Solar Designer and others
Homepage: https://www.openwall.com/john/
Usage: john-avx2-omp [OPTIONS] [PASSWORD-FILES]
Use --help to list all available options.

@ghost
Copy link
Author

ghost commented Mar 10, 2025

CI is happy. Nothing bad so far.

@ghost
Copy link
Author

ghost commented Mar 10, 2025

It seems it tests CPU support twice (as seen below):

configure: Trying to force avx512bw using default method (--enable-simd=avx512bw).
checking if gcc supports -mavx512bw -mavx512vl -mavx512dq w/ linking... yes
checking for extra ASFLAGS... None needed
checking for X32 ABI... no
checking special compiler flags... Intel x86
configure: Testing tool-chain's CPU support with given options
checking for MMX... yes
checking for SSE2... yes
checking for SSSE3... yes
checking for SSE4.1... yes
checking for SSE4.2... yes
checking for AVX... yes
checking for XOP... no
checking for AVX2... yes
checking for AVX512BW + AVX512VL + AVX512DQ... yes
checking if gcc supports -maes -mpclmul... yes

It doesn't hurt.

doc/NEWS Outdated
- Add Oubliette Password Manager support (two formats and oubliette2john.py).
[DavideDG; 2025]

- Turn AVX512 into AVX512BW + AVX512VL + AVX512DQ. [Claudio André; 2025]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's confusing. I suggest:

- Use AVX512VL XOP-like bit rotates for scrypt's Salsa20.  [Solar; 2025]

- When we use AVX512BW, also enable usage of AVX512VL and AVX512DQ.  [Claudio André; 2025]

src/configure.ac Outdated
done
else
CPU_BEST_FLAGS_MAIN=-DJOHN_$(echo ${SIMD_NAME} | tr .a-z _A-Z)
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt we need this complication. Can't we just continue with JOHN_AVX512BW alone, but understand that it implies VL and DQ? I also don't know whether the += syntax works with other shells.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible if everyone understands that BW implies the rest.

src/configure.ac Outdated
CPU_NAME="$host_cpu AVX512BW"
else
CPU_NAME="$host_cpu $SIMD_NAME"
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like unneeded complication as well.

AS_IF([test "x$CPU_NOTFOUND" = x0],
[
CFLAGS="$CFLAGS_BACKUP -mavx512f -P $EXTRA_AS_FLAGS $CPPFLAGS $CFLAGS_EXTRA $CPUID_ASM"
CFLAGS="$CFLAGS_BACKUP -mavx512bw -mavx512vl -mavx512dq -P $EXTRA_AS_FLAGS $CPPFLAGS $CFLAGS_EXTRA $CPUID_ASM"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not implementing the full reverse order of checks + optimization, then maybe let's not reorder F vs. BW here? If we were checking F first, then continue to check it first. This PR's changes would be smaller then.

[CPU_BEST_FLAGS="-mavx512f"]
[SIMD_NAME="AVX512F"]
[CPU_BEST_FLAGS="-mavx512bw -mavx512vl -mavx512dq"]
[SIMD_NAME="AVX512(BW+VL+DQ)"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe continue to say just AVX512BW here.

#include <stdio.h>
extern void exit(int);
int main(){__m512i t, t1;*((long long*)&t)=1;t1=t;t=_mm512_mul_epi32(t1,t);if((*(long long*)&t)==88)printf(".");exit(0);}]]
int main(){__m128i t, t1;*((long long*)&t)=1;t1=t;t=_mm_rol_epi32(t1,1);if((*(long long*)&t)==88)printf(".");exit(0);}]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did suggest using the same intrinsic we actually use, but I didn't mean to test it instead of testing any 512-bit BW intrinsic. I think we should either revert this change entirely or test both _mm_rol_epi32 and _mm512_mul_epi32.

While there are no current nor planned CPUs that have BW without VL nor vice versa, there may be future CPUs supporting AVX10/256 where the 128-bit VL intrinsic would compile and run yet this wouldn't imply support for 512-bit BW. Such future CPUs wouldn't set the CPUID bit corresponding to VL, but here we're not checking CPUID at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't realize you previously got the _mm512_mul_epi32 from the section for F, not for BW. Then revert to what we were checking for BW, please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be future CPUs supporting AVX10/256 where the 128-bit VL intrinsic would compile and run yet this wouldn't imply support for 512-bit BW. Such future CPUs wouldn't set the CPUID bit corresponding to VL

Upon a second thought, actually maybe they would set that CPUID bit. It's no problem, and no reason to change anything in this PR - I am just correcting what I wrote for the sake of it. We may want to add AVX10/256 support later, with a separate PR, and maybe when such CPUs actually appear and can be tested. As a guess, maybe we'll be checking for VL alone as a separate configure test from BW+VL+DQ, and would need to treat it differently in code (in many ways, including CPUID check and non-usage of 512-bit vectors).

#define CPU_NAME "AVX512BW"
#define CPU_REQ_AVX512VL 1
#define CPU_REQ_AVX512DQ 1
#define CPU_NAME "AVX512(BW+VL+DQ)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep all 3 mentioned in CPU_NAME, for reporting in the "Sorry" line. (No further change is needed here.)

Copy link
Author

@ghost ghost Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of CPU_REQ_AVX512VL (+DQ) is also required. At least desired.

Copy link
Member

@solardiz solardiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks almost good enough to merge, with only trivial cleanups maybe left. Thank you, @claudioandre-br!

CFLAGS="$CFLAGS_BACKUP -mavx512bw -mavx512vl -mavx512dq -P $EXTRA_AS_FLAGS $CPPFLAGS $CFLAGS_EXTRA $CPUID_ASM"

AC_MSG_CHECKING([for AVX512BW])
AC_MSG_CHECKING([for AVX512BW + AVX512VL + AVX512DQ])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, the test program we run only checks BW and VL, and then we assume DQ is implied. So we could want to make it just for AVX512BW + AVX512VL here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it gets confusing.

The else part (runs when --enable-simd=avx512bw) has a test program that only tests AVX512BW + AVX512VL.

The if part (runs when --native-tests=true) does not use a test program. It uses CPU_detect().

  1. In any case, both use the -mavx512dq flag.
  2. CPU_detect without setting a value for CPU_REQ_* seems wrong to me. It should be like this:

I added a #define CPU_REQ_AVX512BW 1

#define CPU_REQ_AVX512BW		1
extern int CPU_detect(void); extern char CPU_req_name[];
      unsigned int nt_buffer8x[4], output8x[4];
      int main(int argc, char **argv) { return !CPU_detect(); }

Anyway, should I remove DQ string ???? Is a new commit with a fix for cpu_detection required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I think it's OK to leave this as you have it for this PR, no further change needed. Thank you!

AS_IF([test "x$CPU_NOTFOUND" = x0],
[
AC_MSG_CHECKING([for AVX512BW])
AC_MSG_CHECKING([for AVX512BW + AVX512VL + AVX512DQ])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and here.

(I don't get why we have this in two places.)

Copy link
Member

@magnumripper magnumripper Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we have this in two places.

The first half of m4/jtr_x86_logic.m4 checks what the build host supports (using cpuid), unless cross compiling. The second half of it is (only) for cross compiling [eg. fallbacks], so it just checks what the toolchain can do.

#define C7_AVX512F $0x00010000
#define C7_AVX512BW $0x40010000 /* AVX512BW + AVX512F */
#define C7_AVX512VL $0xC0010000 /* AVX512BW + AVX512VL + AVX512F */
#define C7_AVX512DQ $0xC0030000 /* AVX512BW + AVX512DQ + AVX512VL + AVX512F */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't review the specific bitmasks against the documentation. I just hope they're correct.)

Binary john needs AVX512VL's XOP-like bit rotates for faster Salsa20
in yescrypt.

Without `VL` enabled compilers don't use mnemonics at all.

As it stands now, the possible binaries are:
- AVX512BW + AVX512VL + AVX512DQ
- AVX512F
- AVX2
- And so on.

There is no AVX512BW only binary.

See: #5691.

Signed-off-by: Claudio André <dev@claudioandre.slmail.me>
@solardiz solardiz merged commit 7146e4c into openwall:bleeding-jumbo Mar 12, 2025
35 checks passed
@ghost ghost deleted the AVX512 branch March 12, 2025 11:06
@ghost
Copy link
Author

ghost commented Mar 12, 2025

So far, everything seems to be fine.

Version: 1.9.0-jumbo-1+bleeding-7146e4c827 2025-03-12 05:14:17 +0100
Build: cygwin 64-bit x86_64 AVX512BW AC OMP OPENCL
SIMD: AVX512BW, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
AES hardware acceleration: AES-NI
CPU tests: AVX512(BW+VL+DQ)
CPU fallback binary: john-avx2-omp
OMP fallback binary: john-avx512bw
[...]
Cygwin version: 3.5.7-1.x86_64, 2025-01-29 19:46 UTC
Will run 2 OpenMP threads
Testing: descrypt, traditional crypt(3) [DES 512/512 AVX512F]... (2xOMP) PASS
Testing: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 512/512 AVX512F]... (2xOMP) PASS
Testing: md5crypt, crypt(3) $1$ (and variants) [MD5 512/512 AVX512BW 16x3]... (2xOMP) PASS
Testing: md5crypt-long, crypt(3) $1$ (and variants) [MD5 32/64]... (2xOMP) PASS
Testing: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X3]... (2xOMP) PASS
Testing: scrypt (16384, 8, 1) [Salsa20/8 128/128 AVX512VL]... (2xOMP) PASS
[...]

@magnumripper
Copy link
Member

magnumripper commented Mar 14, 2025

It seems it tests CPU support twice (as seen below):

configure: Trying to force avx512bw using default method (--enable-simd=avx512bw).
checking if gcc supports -mavx512bw -mavx512vl -mavx512dq w/ linking... yes
(...)
configure: Testing tool-chain's CPU support with given options
(...)
checking for AVX512BW + AVX512VL + AVX512DQ... yes

That's because you forced AVX512BW on the command line... I guess the second test is redundant then. That forcing stuff was added later.

@magnumripper
Copy link
Member

So I'm now seeing this:

$ ./john -list=build-info
Version: 1.9.0-jumbo-1-internal+bleeding-ff4c3b5cc3 2025-03-24 16:50:20 +0100
Build: linux-gnu 64-bit x86_64 AVX512BW AC OMP OPENCL
SIMD: AVX512BW, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
AES hardware acceleration: AES-NI
CPU tests: AVX512(BW+VL+DQ)

It says "AVX512BW" twice (which is oddly specific, so I was lead to believe it was literally just that), but then AVX512(BW+VL+DQ) in the "CPU tests" line. I was worried I had ended up with a Frankenstein build that didn't actually have VL or DQ instructions but the scrypt format does say VL so I guess all is fine. This output is confusing but I'm not sure how to make it better.

Another problem is that I should apparently cross compile using --enable-simd=avx512bw (will translate to BW+VL+DQ by an easter egg in ./configure, plus a gcc-implied AVX512F) but if a user tried --enable-simd=avx512vl she would end up with only that (plus the gcc-implied AVX512F) and no BW or DQ. And similarly for using --enable-simd=avx512dq.

This is not a problem for me because I recalled seeing this PR, but how would a user or even a package maintainer know that the only correct way of writing it is --enable-simd=avx512bw?

I'm not sure I have any suggestion for this problem either, other than maybe we should parse --enable-simd=avx512bw without any magic (so would mean -mavx512bw) and instead trigger the easter egg with just --enable-simd=avx512 (which doesn't map directly to anything used by gcc - there's no -mavx512 option).

And if that last idea holds, maybe that leads to an answer for the first problem. It could say:

$ ./john -list=build-info
Version: 1.9.0-jumbo-1-internal+bleeding-ff4c3b5cc3 2025-03-24 16:50:20 +0100
Build: linux-gnu 64-bit x86_64 AVX512 AC OMP OPENCL
SIMD: AVX512, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
AES hardware acceleration: AES-NI
CPU tests: AVX512(BW+VL+DQ)

This output would be much less confusing.

@ghost
Copy link
Author

ghost commented Mar 26, 2025

$ ./john -list=build-info
Version: 1.9.0-jumbo-1-internal+bleeding-ff4c3b5cc3 2025-03-24 16:50:20 +0100
Build: linux-gnu 64-bit x86_64 AVX512 AC OMP OPENCL

Printing only AVX512 seems like a good option to me.

The --enable-simd=avx512 should work now (I remember testing it).

Regarding other issues, the good thing is that people experimenting should know what they are doing or avoid doing it on production systems.

@ghost
Copy link
Author

ghost commented Mar 26, 2025

On second thought, forcing --enable-simd=avx512dq to produce a binary (BW+VL+DQ) is also easy (just add new conditions to the case statement). The problem is: should we do this? Or should we allow people to experiment?

@magnumripper
Copy link
Member

The --enable-simd=avx512 should work now (I remember testing it).

Oh, indeed it does! I must have made a typo when I tried that.

On second thought, forcing --enable-simd=avx512dq to produce a binary (BW+VL+DQ) is also easy (just add new conditions to the case statement). The problem is: should we do this?

We could, but I think we should instead stop --enable-simd=avx512bw from doing so: We should support --enable-simd=avx512 as a recipe for bw+vl+dq (and possibly more in the future) but stop calling it "default method" because such recipe is not the default method! The default method is that --enable-simd=foo gets translated to -mfoo, period, end of story. The --enable-simd=avx512 is a recipe just like eg. --enable-simd=altivec, and neither are the default method.

I didn't test this yet, but I think something like this addresses what I mean:

diff --git a/src/configure.ac b/src/configure.ac
index ba480c409..b27e03d11 100644
--- a/src/configure.ac
+++ b/src/configure.ac
@@ -438,12 +438,11 @@ case "$simd" in
     JTR_FLAG_CHECK_LINK([-mpower8vector], 2)
     SIMD_NAME="Altivec2"
     ;;
-  dnl Handle known cases of --enable-simd=foo --> -mfoo
-  avx512|avx512bw)
-    SIMD_NAME="AVX512BW"
-    AC_MSG_NOTICE([Trying to force $SIMD_NAME using default method (--enable-simd=$simd).])
+  avx512)
     JTR_FLAG_CHECK_LINK([-mavx512bw -mavx512vl -mavx512dq], 2)
+    SIMD_NAME="AVX512"
     ;;
+  dnl Handle known cases of --enable-simd=foo --> -mfoo
   mmx|sse*|ssse3|avx*|xop*)
     SIMD_NAME=`echo $simd | tr a-z A-Z`
     AC_MSG_NOTICE([Trying to force $SIMD_NAME using default method (--enable-simd=$simd).])

Or should we allow people to experiment?

I'm all for allowing people to experiment but we don't want them to struggle.

@ghost
Copy link
Author

ghost commented Mar 27, 2025

Target CPU ......................................... x86_64 AVX512, 64-bit LE
Target OS .......................................... linux-gnu
Version: 1.9.0-jumbo-1+bleeding-aa93adfa59 2025-03-27 09:33:47 -0300
Build: linux-gnu 64-bit x86_64 AVX512 AC OMP
[...]
AES hardware acceleration: AES-NI
CPU tests: AVX512(BW+VL+DQ)

See also openwall/john-packages#778.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants