Skip to content

[compiler] migrate from testng to munit#15356

Draft
patrick-schultz wants to merge 1 commit intohail-is:mainfrom
patrick-schultz:ps/push-yumltwmsqplq
Draft

[compiler] migrate from testng to munit#15356
patrick-schultz wants to merge 1 commit intohail-is:mainfrom
patrick-schultz:ps/push-yumltwmsqplq

Conversation

@patrick-schultz
Copy link
Member

@patrick-schultz patrick-schultz commented Mar 24, 2026

Change Description

Migrate all of our scala based tests from the TestNG test library and runner, to MUnit.

This was a big change, and was primarily carried out by Claude Code. I iterated on one file at a time at first, building up a document containing all details on how to migrate a test file. When I was convinced there weren't many surprising situations remaining to be discovered, I ran a bash script looping over all remaining test files, running a fresh claude code instance with the migration plan doc on each. Then I scanned over all changes, and made a few tweaks to the migration doc and had claude help me apply the new rules consistently.

I've spent enough time looking over the migrated tests to have high confidence that they haven't functionally changed. To review these changes, I recommend reading the final migration plan, copied below, and then spot checking the changes to see how the plan was applied in practice.

Benefits

MUnit is now the test framework recommended by the official Scala docs. It is much simpler than ScalaTest, and much more idiomatic than TestNG. I've found it easy to use, and to hack on when needed. For instance, there isn't any built in support for "test matrices" - multiple test bodies, each of which are run against a list of multiple test inputs. But I added a simple way to encode these in TestCaseSupport. It's a welcome relief after TestNG's Array[Array[Any]] data providers.

The support in Mill also seems to be better. And the test output is far more readable.

Unrelated bugs caught

During one of the migrations, claude also caught a pre-existing bug:

  • SimplifySuite.testBinaryFloatingSimplification used wrong DataProvider: The @Test annotation referenced dataProvider = "binaryIntegralArithmetic" instead of "floatingIntegralArithmetic". This caused the floating-point simplification test to run with the integral arithmetic test data, while the actual floating-point data (defined in binaryFloatingArithmetic with @DataProvider(name = "floatingIntegralArithmetic")) was never exercised. Fixed by wiring the floating test cases to their own TestCases object.
  • ASM4SSuite.{nanDoubleAlwaysComparesFalse, nanFloatAlwaysComparesFalse}didn't test anything, and what they tried to assert was wrong. NaN != x is always true. The idiomatic syntax for scalacheck tests in MUnit makes it much harder to accidentally not test anything like this.
  • ReferenceGenomeSuite.Fasta: was calling Prop.check(), whose doc string says "Convenience method that checks this property and reports the result on the console. Should only be used when running tests interactively within the Scala REPL". :face_palm:. One of the generators was making an interval with a null endpoint, which crashed the test. Fixed by using a non-missing generator.
  • LocalLDPruneSuite.testRandom: was calling Properties.check(), which like the previous one doesn't actually assert anything. Contained a bug where we were comparing two arrays using ==. Fixed.

Running tests

Running tests looks much as it did before, but running individual test cases from the command line is much simpler now:

  • ./mill hail[].test - run all tests
  • ./mill hail[].test.testOnly '*.FooSuite' - only tests in FooSuite
  • ./mill hail[].test.testOnly '*.FooSuite' -- '*.test foo' - only "test foo"
    • note that the second glob matches the entire path, hence the initial *
  • ./mill hail[].test '*.FooSuite.test foo' - equivalent to above, however mill will run all test classes before discovering that only one contains a matching test, so the above is a bit faster

Another option to run a single test is to annotate the test in the source, from test("test foo") {...} to test("test foo".only) {...}, then run that test suite using ./mill hail[].test.testOnly '*.FooSuite'.

IntelliJ integration is also working flawlessly for me.

Security Assessment

  • This change cannot impact the Hail Batch instance as deployed by Broad Institute in GCP

The rest of this description is the migration plan doc I provided to claude for each file.

Migrating Hail Test Files from TestNG to MUnit

Overview

There is a single test module (object test in hail/package.mill, uses mill.scalalib.TestModule.Munit). All test sources live in hail/test/src/. Suites are migrated in-place one at a time by switching from HailSuite to MUnitHailSuite (or from TestNGSuite to munit.FunSuite for suites that don't need ctx/fs).

Plan for Migrating One Test File

Phase 1: Scaffold

  1. Change the class to extend MUnitHailSuite or munit.FunSuite (see Class Declaration).
  2. Fix imports (see Imports).
  3. Convert @BeforeMethodbeforeEach (see @BeforeMethod beforeEach).

Phase 2: Convert Tests

Apply the conversion patterns below in order — each builds on the previous:

  1. Convert @Test def methods → test() blocks (see Regular Tests).
  2. Convert SkipExceptionassume (see SkipException assume).
  3. Convert static @DataProvider + @Test(dataProvider=...)object Foo extends TestCases (see Static DataProviders).
  4. Convert dynamic @DataProvider (needs ctx/fs) → TestCases object or local helper def with by-name parameters; move setup side-effects to beforeAll() (see Dynamic DataProviders).
  5. Convert ScalaTest forAll.foreach (see forAll).
  6. Convert ScalaCheck property tests → property() with munit-scalacheck (see ScalaCheck Property Tests).
  7. Rewrite scalatest matchers (should, shouldBe, etc.) to plain MUnit assertions (see Scalatest Matchers).
  8. Rewrite assert(x == y)assertEquals(x, y) and split conjunctive asserts (see Assertions).

Phase 3: Verify

  1. Compile: ./mill 'hail[].test.compile'
  2. Fix any compilation errors (usually missing utilities that need copying).
  3. Run the migrated suite: ./mill 'hail[].test.testOnly' 'is.hail.package.SuiteName'

Key Files

  • hail/test/src/is/hail/HailSuite.scala — the old TestNG/ScalaTest base class (unmigrated suites still extend this)
  • hail/test/src/is/hail/MUnitHailSuite.scala — the new MUnit base class (migrated suites that need ctx/fs extend this)
  • hail/test/src/is/hail/TestCaseSupport.scala — standalone trait providing the TestCases inner trait; can be mixed into any munit.FunSuite
  • hail/test/src/is/hail/ExecStrategy.scala — the ExecStrategy enumeration
  • hail/test/src/is/hail/expr/ir/TestUtils.scala — IR helper object with IRArray, IRStream, IRDict, etc.
  • hail/test/src/is/hail/scalacheck/ — the full is.hail.scalacheck package (genLocus, genNullable, partition, etc.)

Base Class: MUnitHailSuite

Replaces HailSuite (which extends TestNGSuite with TestUtils). Key differences:

  • Lifecycle: Uses MUnit's beforeAll()/afterAll() instead of TestNG's @BeforeClass/@AfterClass
  • SparkBackend: Lazy init in companion object with JVM shutdown hook (replaces @BeforeSuite/@AfterSuite)
  • Assertions: Returns Unit instead of scalatest's Assertion. succeed()
  • No scalatest dependency: MUnit's FunSuite provides assert(), assertEquals(), intercept[E]()
  • TestCases trait: Provided via the TestCaseSupport mixin (see below)

Suites That Don't Need ctx/fs

Some suites extend TestNGSuite directly (not HailSuite) because they don't need ctx, fs, or the Spark backend. For these, extend munit.FunSuite directly instead of MUnitHailSuite:

// Before
class FooSuite extends TestNGSuite {
// After
class FooSuite extends munit.FunSuite {

If the suite also needs parameterized tests (TestCases), mix in TestCaseSupport:

// Before
class FooSuite extends TestNGSuite {
// After
class FooSuite extends munit.FunSuite with TestCaseSupport {

TestCaseSupport is a standalone trait (hail/test/src/is/hail/TestCaseSupport.scala) that provides the TestCases inner trait. MUnitHailSuite already mixes it in, so suites extending MUnitHailSuite get TestCases automatically.

Conversion Patterns

1. Class Declaration

// Before (needs ctx/fs)
class FooSuite extends HailSuite {
// After
class FooSuite extends MUnitHailSuite {

// Before (no ctx/fs needed)
class FooSuite extends TestNGSuite {
// After (without parameterized tests)
class FooSuite extends munit.FunSuite {
// After (with parameterized tests)
class FooSuite extends munit.FunSuite with TestCaseSupport {

2. Imports

Remove:

  • org.testng.annotations.{DataProvider, Test}
  • org.scalatest.Inspectors.forAll
  • org.scalatest.enablers.InspectorAsserting
  • org.scalatestplus.scalacheck.CheckerAsserting.assertingNatureOfAssertion
  • org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks
  • org.scalatestplus.testng.TestNGSuite

Change:

  • import is.hail.{ExecStrategy, HailSuite}import is.hail.{ExecStrategy, MUnitHailSuite}

3. @BeforeMethodbeforeEach

TestNG's @BeforeMethod runs before each test method. Convert to MUnit's beforeEach:

// Before
@BeforeMethod
def resetState(): Unit =
  someGlobal = 0

// After
override def beforeEach(context: BeforeEach): Unit = {
  super.beforeEach(context)
  someGlobal = 0
}

If the old method takes a TestNG ITestContext parameter which is actually used, ask the user for guidance on how to handle it. If it isn't used, drop it — it has no MUnit equivalent.

4. Regular @Test Methods → test() Blocks

// Before
@Test def testFoo(): Unit =
  body

// After
test("Foo") { body }

Strip the test prefix from method names for the test label (e.g., testFoo"Foo"). Prefer to keep short tests on a single line.

Disabled Tests (enabled = false)

// Before
@Test(enabled = false) def testFoo(): Unit =
  body

// After
test("Foo".ignore) { body }

MUnit's .ignore tag preserves the "disabled" semantics — the test is reported as ignored rather than running it.

5. SkipExceptionassume

TestNG's SkipException is replaced by MUnit's assume(condition, clue), which skips when the condition is false. This works both at suite level (in beforeAll(), skipping all tests) and inside individual test() blocks (skipping just that test).

Suite-level skip (beforeAll)

// Before
import org.testng.SkipException
import org.testng.annotations.BeforeClass

@BeforeClass
def beforeclass(): Unit =
  if (System.getenv("HAIL_CLOUD") != "gcp" || root == null)
    throw new SkipException("skip")

// After
override def beforeAll(): Unit = {
  super.beforeAll()
  assume(
    System.getenv("HAIL_CLOUD") == "gcp" && root != null,
    "not in GCP",
  )
}

Single-test skip (inside test())

// Before
@Test def testFileListEntryRootWithSlash(): Unit = {
  if (root.endsWith("/")) throw new SkipException("skipped")
  val s = fs.fileListEntry(s"$root/")
  assert(s.getPath == root)
}

// After
test("FileListEntryRootWithSlash") {
  assume(!root.endsWith("/"), "skipped")
  val s = fs.fileListEntry(s"$root/")
  assert(s.getPath == root)
}

Note the condition is invertedSkipException is thrown when the condition to skip is true, while assume skips when the condition is false. Also remove the org.testng.SkipException import.

6. @DataProvider — Static Data (no ctx/fs needed)

Use the TestCases object pattern:

// Before
@DataProvider(name = "myData")
def myData(): Array[Array[Any]] = Array(Array(1, "a"), Array(2, "b"))

@Test(dataProvider = "myData")
def testMyThing(x: Int, y: String): Unit = { ... }

// After
object checkMyThing extends TestCases {
  def apply(
    x: Int,
    y: String,
  )(implicit loc: munit.Location
  ): Unit = test("MyThing") {
    // ... body using x, y ...
  }
}

checkMyThing(1, "a")
checkMyThing(2, "b")

TestCases is a trait defined in MUnitHailSuite:

trait TestCases {
  var i: Int = 0
  def test(name: String)(body: => Any)(implicit loc: munit.Location): Unit = {
    i += 1
    outer.test(s"$name case $i")(body)
  }
}

Its test() method auto-increments a counter, producing names like "MyThing case 1", "MyThing case 2", etc.

Key points:

  • Use object (not class) so there's exactly one instance per test suite
  • apply takes fully typed parameters — no Array[Any] or .asInstanceOf casting
  • implicit loc: munit.Location is required so MUnit reports failures at the correct call-site line
  • All registrations happen at class construction time

Shared DataProviders (one data set, multiple tests)

When a single @DataProvider is referenced by multiple @Test methods, extract the data as a shared val at class scope and use .foreach to register each TestCases object's cases:

// Before
@DataProvider(name = "basic")
def basicData(): Array[Array[Any]] = Array(Array(FastSeq(3, 7)), Array(null), ...)

@Test(dataProvider = "basic")
def testIsEmpty(a: IndexedSeq[Integer]): Unit = ...

@Test(dataProvider = "basic")
def testAppend(a: IndexedSeq[Integer]): Unit = ...

// After
val basicData: Array[IndexedSeq[Integer]] =
  Array(FastSeq(3, 7), null, FastSeq(3, null, 7, null), FastSeq())

object checkIsEmpty extends TestCases { ... }
basicData.foreach(checkIsEmpty(_))

object checkAppend extends TestCases { ... }
basicData.foreach(checkAppend(_))

This avoids duplicating the test cases for each test and keeps the data in one place, mirroring the original @DataProvider sharing.

7. @DataProvider — Dynamic Data (needs ctx/fs)

For DataProviders whose cases reference values only available at test execution time (e.g., ctx, fs, or anything derived from them), the tests must still be registered at class construction time, but argument evaluation must be deferred until the test runs (after beforeAll() sets up ctx).

Two techniques work — choose based on how much shared state the cases need:

7a. TestCases object + block with lazy val (preferred when cases share setup)

When many cases share expensive-to-build values (e.g., a read that many IRs derive from), wrap them in lazy val inside a block. The lazy val is captured by the by-name closure and only forced when the first test that touches it actually runs.

object checkTableIRParser extends TestCases {
  def apply(
    ir: => TableIR
  )(implicit loc: munit.Location
  ): Unit = test("table IR parser") {
    val ir0 = ir
    val s = Pretty.sexprStyle(ir0, elideLiterals = false)
    val x2 = IRParser.parse_table_ir(ctx, s)
    assert(x2 == ir0)
  }
}

{
  // lazy — depends on ctx, evaluated when the first test body runs
  lazy val read = TableIR.read(ctx.fs, getTestResource("..."))
  lazy val mtRead = MatrixIR.read(ctx.fs, getTestResource("..."))
  // plain val — pure, no ctx needed
  val b = True()

  checkTableIRParser(TableDistinct(read))
  checkTableIRParser(TableKeyBy(read, Array("m", "d")))
  checkTableIRParser(TableFilter(read, b))
  // ... one call per case
}

The object lives at class scope; the block { ... } scopes the shared lazy vals. Each call's argument is by-name (ir: => TableIR), so read is not forced at registration time.

When to use lazy val vs plain val inside the block:

  • lazy val — anything that transitively touches ctx, fs, or other test-execution-time state
  • val — pure IR constructors like True(), I32(5), ApplyAggOp(Collect())(I32(0))

7b. Local helper def (for one-off groups or when TestCases overhead isn't worth it)

{
  val b = True()

  def testParseIR(name: String, ir: => IR, refMap: BindingEnv[Type] = BindingEnv.empty): Unit = {
    test(s"$name construction") { ir: Unit }
    test(s"$name parsing") {
      val ir0 = ir  // evaluate once — re-evaluating would create new freshNames
      val s = Pretty.sexprStyle(ir0, elideLiterals = false)
      val x2 = IRParser.parse_value_ir(ctx, s, refMap)
      assertEquals(x2, ir0)
    }
  }

  testParseIR("I32", I32(5))
  testParseIR("NA", NA(TInt32))
  // ... one call per case
}

Setup side-effects

If the old @DataProvider method contains side-effects (e.g., CompileAndEvaluate to index a bgen file), move them to override def beforeAll() so they run once before any test body. Don't leave them inside the block or the TestCases object.

8. forAll.foreach

ScalaTest's forAll (from Inspectors) is just a fancy foreach:

// Before
forAll(collection) { case (a, b) => ... }
// After
collection.foreach { case (a, b) => ... }

Watch for two forAll syntax variants:

  • forAll(expr) { ... } — straightforward, replace with expr.foreach { ... }
  • forAll { Array(...) } { ... } — curly-brace variant, replace with Array(...).foreach { ... }

9. ScalaCheck Property Tests (munit-scalacheck)

The test module has munit-scalacheck as a dependency. For tests that use ScalaCheck's forAll for property-based testing (as opposed to ScalaTest's Inspectors.forAll), use property() instead of test():

// Before (TestNG + scalatestplus-scalacheck)
class FooSuite extends TestNGSuite with ScalaCheckDrivenPropertyChecks {
  @Test def testBar(): Unit =
    forAll(Gen.choose(0, 100)) { x => assert(x >= 0) }
}

// After
class FooSuite extends munit.ScalaCheckSuite {
  property("Bar") =
    forAll(Gen.choose(0, 100)) { x =>
      x >= 0
    }
}

Key points:

  • Extend munit.ScalaCheckSuite instead of munit.FunSuite
  • Use property("name") = forAll(...) { ... } — note the =, no braces wrapping the body, and no .check()
  • The body can either return a Boolean/Prop or use MUnit assertions (assert, assertEquals, etc.) — munit.ScalaCheckSuite provides an implicit Unit => Prop conversion, so ending a forAll body with an assertion (which returns Unit) works directly
  • Non-property tests can still use test() in the same suite
  • When a suite needs both MUnitHailSuite (for ctx/fs) and scalacheck properties, extend both: class FooSuite extends MUnitHailSuite with munit.ScalaCheckSuite

property() syntax

Always use property() for ScalaCheck-based tests — never use test() with .check(). This ensures the test framework controls generator seeding and surfaces the failing seed on test failure.

There are two syntax variants depending on whether the forAll is at the top of the body:

= syntax — when forAll is the entire test body (no setup before it):

property("deep copy") = forAll(genPTypeVal[PCanonicalStruct](ctx)) { case (t, a: Row) =>
  val copy = ctx.scopedExecution { ... }
  assertEquals(copy, a)
}

Block syntax — when there's setup code before the forAll:

property("codec") {
  val region = Region(pool = pool)
  val g: Gen[...] = ...
  forAll(g) { case (t, a) =>
    ...
  }
}

Multiple forAll calls — combine them into a single Prop using ++:

property("multiple properties") {
  forAll(genFoo) { foo => assert(foo.isValid) } ++
    forAll(genBar) { bar => assertEquals(bar.size, 3) }
}

The Unit => Prop implicit from munit.ScalaCheckSuite is in scope for all of these, so the forAll body can end with an assertion — no need to add true.

Remove these imports:

  • org.scalatestplus.scalacheck.CheckerAsserting.assertingNatureOfAssertion
  • org.scalatestplus.scalacheck.ScalaCheckDrivenPropertyChecks

10. Scalatest Matchers

Some suites use scalatest matchers (should, shouldBe, should not be, custom Matcher/MatcherFactory1, etc.) and related imports (org.scalactic, org.scalatest.matchers). These must be rewritten to use plain MUnit assertions (assert, assertEquals, intercept, etc.) or simple helper methods. The specific rewrite depends on what the matcher does — there are too many patterns to enumerate, but they should be straightforward to figure out case by case.

Remove all scalatest matcher imports, e.g.:

  • org.scalactic.{Equivalence, Prettifier}
  • org.scalatest.matchers._
  • org.scalatest.matchers.should.Matchers._
  • org.scalatest.matchers.must.Matchers._
  • org.scalatest.matchers.dsl._

11. Assertions: assertEquals and Splitting Conjunctions

Prefer assertEquals over assert(x == y). MUnit's assertEquals gives a structured diff on failure instead of just "assertion failed":

// Before
assert(result == expected)
// After
assertEquals(result, expected)

Split conjunctive assert calls into separate assertions so that failures pinpoint exactly which condition failed:

// Before
assert(rg.inX("X") && rg.inY("Y") && rg.isMitochondrial("MT"))

// After
assert(rg.inX("X"))
assert(rg.inY("Y"))
assert(rg.isMitochondrial("MT"))

Similarly, break up assert(x.forall { ... }) over small collections into .foreach with individual asserts when the collection is inline and small:

// Before
assert(x.forall { case (c, gt, repr) =>
  Call.isPhased(c) && Call.ploidy(c) == 2 && Call.alleleRepr(c) == repr
})

// After
x.foreach { case (c, gt, repr) =>
  assert(Call.isPhased(c))
  assertEquals(Call.ploidy(c), 2)
  assertEquals(Call.alleleRepr(c), repr)
}

Leave assert as-is when:

  • The condition is a single boolean expression (e.g., assert(rg.compare("3", "18") < 0))
  • The expression uses a custom comparator like D_== that isn't a simple equality check

Build Commands

# Compile the MUnit test module
./mill 'hail[].test.compile'

# Run all MUnit tests
./mill 'hail[].test'

# Run a specific test class
./mill 'hail[].test.testOnly' 'is.hail.expr.ir.IRSuite'

@patrick-schultz patrick-schultz force-pushed the ps/push-yumltwmsqplq branch 6 times, most recently from 8ec3dd0 to 570cc53 Compare March 25, 2026 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant