SSH OpenCL format: synchronize with CPU format#5747
SSH OpenCL format: synchronize with CPU format#5747ghost wants to merge 1 commit intobleeding-jumbofrom unknown repository
Conversation
Is this issue specific to this format at all? Maybe a change is needed in the shared OpenCL host code? |
I prefer to avoid invasive changes. On the other hand, why would anyone on Earth need to set LWS=1024 for a CPU? |
|
The other way to go is: From e46341d54a42e325f6a16b810403a8c23826c7b0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Claudio=20Andr=C3=A9?= <dev@claudioandre.slmail.me>
Date: Mon, 31 Mar 2025 09:01:47 -0300
Subject: [PATCH] OpenCL autotune: limit LWS up to 256 on CPU
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
I've seen many segmentation faults in the SSH OpenCL format when it
reaches 1024.
Signed-off-by: Claudio André <dev@claudioandre.slmail.me>
---
src/opencl_autotune.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/opencl_autotune.c b/src/opencl_autotune.c
index 47f1b421e..5553e36ba 100644
--- a/src/opencl_autotune.c
+++ b/src/opencl_autotune.c
@@ -59,6 +59,9 @@ size_t autotune_get_task_max_work_group_size(int use_local_memory,
else
max_available = get_device_max_lws(gpu_id);
+ if (cpu(device_info[gpu_id]) && (max_available > 256))
+ max_available = 256;
+
if (max_available > get_kernel_max_lws(gpu_id, crypt_kernel))
return get_kernel_max_lws(gpu_id, crypt_kernel);
--
2.43.0I don't see any reason why, for example, one should use LWS > 128 on a CPU. But let's listen to magnum's wise words. |
This is really weird placement of braces. I doubt this does what you intended. |
Oh, the parentheses are indeed wrong. The idea is represented. |
I believe it varies a lot by implementation: Some CPU runtimes (perhaps only macOS) are even stupidly pegged to LWS=1 unless, only maybe unless, a kernel really requires higher. Hopefully they will cope then, or at least pretend to. But all Apple runtimes are lemon runtimes. I'm not sure how LWS would/could correlate to CPU threads or cores but they should in some way, right? Intuitively (and I could be completely wrong) I would guess something like LWS == number of cores/threads should be reasonable. I'm trying to visualise some relation to CPU formats' Edit: I just recalled (iirc) that the first Intel CPU runtime I used came with a recommendation to use LWS=8, regardless of job, hardware and so on. I have absolutely no idea why. Edit2: BTW, Cuda's notion of "blocks" (which is just GWS/LWS) sounds pretty much like our |
- relax ASN.1 checks; - simplify support for EC keys. See #5745. Signed-off-by: Claudio André <dev@claudioandre.slmail.me>
It's WIP because I need #5745 merged and we need to test using a GPU. I haven't found any problems with my hardware.
[EDITED]
Notes:
#ifdef CPU_FORMATand self test still passes (5 new vectors have been added);The difference between the formats is 5 vectors (2 x type 2 and 2 x type 6 + 1 DES) (none implemented for OpenCL):
Only types 2 and 6 are excluded
----
!self_test_runningprobably can't handle self-testing properly. So, I'm not sure if we should port them to OpenCL;