ENH: Implement the exact p-value of the Cramér-von Mises two-sample test by fbourgey · Pull Request #108 · scipy/xsf

fbourgey · 2026-03-13T22:13:15Z

Reference issue

Toward #98

What does this implement/fix?

Implement the exact p-value of the Cramer-von Mises two-sample test for a given value of the test statistic from _pval_cvm_2samp_exact

https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93von_Mises_criterion

mdhaber · 2026-03-14T16:02:51Z

+    for (int64_t u = 0; u < n + 1; ++u) {
+        std::vector<FreqTable> next_gs;
+        FreqTable tmp;
+        for (int64_t v = 0; v < m + 1; ++v) {
+            // Calculate g recursively with eq. 11 in [1]. Even though it
+            // doesn't look like it, this also does 12/13 (all of Algorithm 1).
+            const FreqTable &g = gs[v];
+            // Merge tmp and g: for common keys sum frequencies,
+            // keep unique keys from both sides.
+            // (equivalent to np.intersect1d + concatenate logic)
+            FreqTable merged;
+            for (const auto &[key, freq] : tmp) {
+                merged[key] += freq;
+            }
+            for (const auto &[key, freq] : g) {
+                merged[key] += freq;
+            }
+            int64_t diff = a * v - b * u;
+            int64_t res = diff * diff;
+            // tmp[0] += res  (shift all keys by res)
+            tmp.clear();
+            for (const auto &[key, freq] : merged) {
+                tmp[key + res] += freq;
+            }
+            next_gs.push_back(tmp);
+        }
+        gs = std::move(next_gs);
+    }


Can you provide additional information that helps the reader compare this to either the original code, which is pretty hard to follow, or the algorithm, which looks simple but has terms like $I[a^2u(u+1)(2u + 1)/6]$ and $I[b^2u(u+1)(2u + 1)/6]$ that don't obviously appear in either version? If it would be cleaner, feel free to express this in terms of Algorithm 1 from the paper rather than the Python code. There are a lot of aspects of the Python that don't look like a very efficient expression of the algorithm (lots of placeholding size-zero array, use of arrays instead of dicts, use of delete, etc.), so if it can't be translated directly, it is probably not worth trying to maintain a resemblance.

I agree; I've found the Python code quite difficult to follow. I’ll provide more detail and include screenshots of the relevant sections of the paper in the coming days.

steppi · 2026-03-14T20:16:21Z

+    using FreqTable = std::map<int64_t, int64_t>;
+    // the frequency table of g_{u, v}^+ defined in [1, p. 6]
+    // gs[0] = {0: 1}, gs[1..m] = empty
+    std::vector<FreqTable> gs(m + 1);


@mdhaber, you want this to be able to work in CuPy right? std::map and std::vector aren't available in CUDA.

Also, rather than putting this whole thing into a scalar kernel, and making a ufunc out it, it seems like it may be better to take the idea of the original function and decompose it into the ufuncs that are needed to make it work. I think the key component here would be a gufunc to generate the frequency table. lcm and the binomial coefficients could be handled through delegation, and the rest looks like it could be made array-agnostic.

you want this to be able to work in CuPy right?

Yes. OK, good to know.

fbourgey · 2026-03-31T19:37:35Z

+ */
+inline double pval_cvm_2samp_exact(double s, int64_t m, int64_t n) {
+    // [1, p. 3]
+    int64_t lcm = std::lcm(m, n);


Page 3 in [1]: Definition of the variable lcm. This is $L$ in [1].

fbourgey · 2026-03-31T19:38:34Z

+    int64_t a = lcm / m;
+    int64_t b = lcm / n;


Page 4 in [1]: definitions of $a$ and $b$

fbourgey · 2026-03-31T19:48:02Z

+    // Combine Eq. 9 in [2] with Eq. 2 in [1] and solve for $\zeta$
+    // Hint: `s` is $U$ in [2], and $T_2$ in [1] is $T$ in [2]
+    int64_t mn = m * n;
+    // Uses double floor division since s is double
+    int64_t zeta = std::floor((lcm * lcm * (m + n) * (6.0 * s - mn * (4.0 * mn - 1))) / (6.0 * mn * mn));


Using consistent notations and following the Python comment:
$s$ is $U$ in [2] and $T_2$ in [1] same as $T$ in [2]

From [1, page 3]

Rewriting in terms of $\zeta$, we have

$$ \zeta = \frac{(n+n)^2 L^2}{mn} T_2 $$

and from [2, eq. 9] (and using the same notations as in [1])

$$ T_2 = \frac{s}{nm(n+m)} - \frac{4mn-1}{6(m+n)} $$

Simple algebra gives

$$ \zeta = \frac{(m+n)L^2}{6 m^2 n^2} (6 s - (4mn - 1) mn) $$

We then floor as $\zeta$ is an integer-valued statistic.

fbourgey · 2026-03-31T20:06:20Z

+    std::vector<FreqTable> gs(m + 1);
+    gs[0][0] = 1;


Initialize the frequency tables of $\zeta$ for paths reaching $(u,v)$.

Each table is a $2 \times k$ array where $k =$ number of distinct values of $\zeta$ at $(u,v)$ ie

g = [[zeta_1, zeta_2, ..., zeta_k], [count_1, count_2, ..., count_k]]

row 0 = distinct values of $\zeta$ and row 1 = how many paths give each value

Initially: $g_{0,0}$ = {(0,1)}

fbourgey · 2026-03-31T20:13:35Z

+    // (equivalent to return np.float64(np.sum(freq[value >= zeta]) / combinations))
+    const FreqTable &final_table = gs[m];
+    int64_t sum_freq = 0;
+    for (const auto &[value, freq] : final_table) {
+        if (value >= zeta) {
+            sum_freq += freq;
+        }
+    }
+    return static_cast<double>(sum_freq) / static_cast<double>(combinations);
+}


Compute p-value

$$ \mathbb{P}(\zeta \geq \zeta_{obs}) = \frac{\text{number of paths with } \zeta \geq \zeta_{obs}}{\binom{m+n}{m}} $$

fbourgey · 2026-03-31T20:27:03Z

+            int64_t diff = a * v - b * u;
+            int64_t res = diff * diff;
+            // tmp[0] += res  (shift all keys by res)
+            tmp.clear();
+            for (const auto &[key, freq] : merged) {
+                tmp[key + res] += freq;
+            }


Apply shift $S_{d^2}$. From Eq. 11 of [1]

fbourgey · 2026-03-31T20:39:21Z

+            const FreqTable &g = gs[v];
+            // Merge tmp and g: for common keys sum frequencies,
+            // keep unique keys from both sides.
+            // (equivalent to np.intersect1d + concatenate logic)
+            FreqTable merged;
+            for (const auto &[key, freq] : tmp) {
+                merged[key] += freq;
+            }
+            for (const auto &[key, freq] : g) {
+                merged[key] += freq;
+            }


Translation of the original Python code.

vi, i0, i1 = np.intersect1d(tmp[0], g[0], return_indices=True) tmp = np.concatenate([ np.stack([vi, tmp[1, i0] + g[1, i1]]), np.delete(tmp, i0, 1), np.delete(g, i1, 1) ], 1)

First line: find common $\zeta$ values between $g_{u,v-1}$ and $g_{u-1,v}$

Second line: this implements $g_{u,v-1} + g_{u-1,v}$

fbourgey · 2026-03-31T21:02:44Z

+        FreqTable tmp;
+        for (int64_t v = 0; v < m + 1; ++v) {
+            // Calculate g recursively with eq. 11 in [1]. Even though it
+            // doesn't look like it, this also does 12/13 (all of Algorithm 1).


It appears that (12) is not needed, as this is produced automatically by how gs and tmp are initialized.

For $v=0$, we have $d^2 = (av-bu)^2 = b^2 u^2$. Hence,

$$ g_{u,0} = S_{b^2 u^2} (g_{u-1,0}). $$

Starting from $g_{0,0} = I[0]$, at step $u=1$, $g_{1,0} = I[b^2]$, ... and for a general,

$$ g_{u,0} = I\left[\sum_{j=1}^{u} (bj)^2\right] = b^2 \frac{u(u+1)(2u+1)}{6} $$

Here, we use that $S_d(I[c]) = I[c+d])$

fbourgey added 2 commits March 13, 2026 18:08

ENH: Add exact p-value computation for Cramer-von Mises two-sample test

fc55b22

TST: Add tests

507577a

fbourgey changed the title ~~ENH: Implement~~ ENH: Implement the exact p-value of the Cramer-von Mises two-sample test Mar 13, 2026

fbourgey mentioned this pull request Mar 13, 2026

ENH: Port null-distribution special functions from scipy.stats to xsf #98

Open

18 tasks

mdhaber reviewed Mar 14, 2026

View reviewed changes

Comment thread include/xsf/stats.h Outdated

mdhaber reviewed Mar 14, 2026

View reviewed changes

MAINT: use xsf/binom

95b0163

steppi reviewed Mar 14, 2026

View reviewed changes

fbourgey commented Mar 31, 2026

View reviewed changes

fbourgey marked this pull request as draft March 31, 2026 20:07

fbourgey commented Mar 31, 2026

View reviewed changes

fbourgey changed the title ~~ENH: Implement the exact p-value of the Cramer-von Mises two-sample test~~ ENH: Implement the exact p-value of the Cramér-von Mises two-sample test Apr 2, 2026

fbourgey mentioned this pull request Apr 3, 2026

ENH: Add exact Cramér-von Mises two-sample p-value gufunc #118

Open

fbourgey added the enhancement New feature or request label Apr 15, 2026

Uh oh!

Conversation

fbourgey commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Uh oh!

Uh oh!

mdhaber Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

steppi Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fbourgey Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fbourgey commented Mar 13, 2026 •

edited

Loading

steppi Mar 14, 2026 •

edited

Loading

fbourgey Mar 31, 2026 •

edited

Loading

fbourgey Mar 31, 2026 •

edited

Loading

fbourgey Mar 31, 2026 •

edited

Loading

fbourgey Mar 31, 2026 •

edited

Loading