Skip to content

+20-100% performance with special BigInt reduction#117

Open
georg95 wants to merge 7 commits intopaulmillr:mainfrom
georg95:main
Open

+20-100% performance with special BigInt reduction#117
georg95 wants to merge 7 commits intopaulmillr:mainfrom
georg95:main

Conversation

@georg95
Copy link

@georg95 georg95 commented Feb 16, 2026

Turned out, special reduction even with BigInt is twice as fast than just % P

const P_MASK = (1n << 255n) - 1n;
const MAX_MFAST = P*P
const M_fast = (num: bigint) => {
  if (num < 0n || num > MAX_MFAST) { err('don\'t use M_fast for numbers < 0 or > P * P') }
  let r = (num >> 255n) * 19n + (num & P_MASK);
  r = (r >> 255n) * 19n + (r & P_MASK);
  return r >= P ? r - P : r;
};

Benchmark difference, tested on Apple M4, node.js 24.11.1:

init 12ms ->
init 10ms

keygen x 10,673 ops/sec @ 93μs/op ->
keygen x 14,461 ops/sec @ 69μs/op

sign x 5,439 ops/sec @ 183μs/op ->
sign x 7,153 ops/sec @ 139μs/op

verify x 1,226 ops/sec @ 815μs/op ->
verify x 1,953 ops/sec @ 511μs/op

keygenAsync x 10,122 ops/sec @ 98μs/op ->
keygenAsync x 12,934 ops/sec @ 77μs/op

signAsync x 4,876 ops/sec @ 205μs/op ->
signAsync x 5,974 ops/sec @ 167μs/op

verifyAsync x 1,206 ops/sec @ 828μs/op ->
verifyAsync x 1,924 ops/sec @ 519μs/op

Point.fromBytes x 21,069 ops/sec @ 47μs/op ->
Point.fromBytes x 42,383 ops/sec @ 23μs/op

@georg95 georg95 changed the title +20-100% performance with special BigIng reduction +20-100% performance with special BigInt reduction Feb 16, 2026
@paulmillr paulmillr requested a review from Copilot February 16, 2026 16:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a significant performance optimization for the ed25519 implementation by adding a specialized modular reduction function M_fast that leverages the special form of the ed25519 prime P = 2^255 - 19. The optimization replaces generic modular reduction (M(a * b)) with a Barrett-style reduction optimized for this specific prime when the inputs are known to be products of two field elements.

Changes:

  • Added M_fast function implementing specialized Barrett reduction for P = 2^255 - 19
  • Replaced M() with M_fast() for multiplication results in hot-path operations (point equality, doubling, addition, and square root computations)
  • Added intermediate M() wrapping of sums before squaring to ensure M_fast's input constraints are met

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
index.ts Introduces M_fast function and optimizes point arithmetic operations (equals, double, add), toAffine conversion, and modular exponentiation helpers (pow2, pow_2_252_3, uvRatio)
index.js Mirrors the TypeScript changes with identical optimizations to maintain consistency between compiled and source versions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

paulmillr and others added 3 commits February 16, 2026 18:09
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@paulmillr
Copy link
Owner

@georg95 thanks for the contribution.

Could you detail the algorithm you've used? I've tried Pseudo-Mersenne reduction and Barrett reduction and the speed result doesn't seem to match in #118

@georg95
Copy link
Author

georg95 commented Feb 17, 2026

@georg95 thanks for the contribution.

Could you detail the algorithm you've used? I've tried Pseudo-Mersenne reduction and Barrett reduction and the speed result doesn't seem to match in #118

It's special case for reducing numbers of order P * P. Generalized solution is not fast because js overhead + CPU don't like branching and loops. I even thought of removing assert in that function cause its too eats from performance, especially if you call it for numbers of order P (result of addition) - in that case % P is already fast enough, I think its internally does just couple of comparisons and subtractions.

Algorithm can be derived:
If num < P * P, then it can be represented by 255-bit limbs: a, b
num = (a * 2^255 + b)

num % P = (a * 19 + b) % P, because of modulo arithmetic rules, and 2^255 % P = 19,
So
a = num >> 255
b = num & P_MASK
num % P = ((num >> 255) * 19 + num & P_MASK) % P

In code:
let r = (num >> 255n) * 19n + (num & P_MASK);
After first reduction order from P * P became just P * 20
r = (r >> 255n) * 19n + (r & P_MASK);
And after second its something guaranteed < 2P, so we check and subtract P if its the case:
return r >= P ? r - P : r;

I just checked, and replacing return r >= P ? r - P : r; with return r % P; results in same speed, but second reduction adds performance even if its just order P * 20

@paulmillr
Copy link
Owner

Is the technique used elsewhere, or described in some kind of paper? Or did you came up with it on your own?

@georg95
Copy link
Author

georg95 commented Feb 17, 2026

Is the technique used elsewhere, or described in some kind of paper? Or did you came up with it on your own?

Gemini pro suggested me it, when tried to speed up points decompression for monero blockchain.
I verified reduction, its just Mersenne trick.

PS: test failing because assertion failed at M_fast(P*P) case, I changed final reduction with r % P and removed assertion for upper bound, because with % P it will work for any positive number, but fastest with P*P order.

@paulmillr
Copy link
Owner

paulmillr commented Feb 18, 2026

AI is great, but we will have to investigate whether the algo is legit; or may have some weird edge cases which won't get catched by tests.

From my research, a similar algorithm is described in Handbook of Applied Cryptography, 14.47-14.50. Need to dive into that and think a bit.

@georg95
Copy link
Author

georg95 commented Feb 18, 2026

Here are 2 statements about this reduction:

  1. limbs decomposition
r === (r >> 255n) * (2n**255n) + (r & P_MASK)
  1. modulo arithmetic reduction
((r >> 255n) * (2n**255n) + (r & P_MASK)) %P === ((r >> 255n) * 19n + (r & P_MASK)) % P

They should be both be true to not produce invalid results.

For negative numbers first stament is false, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants