Skip to content

Clean up JVM backend#25618

Open
SolalPirelli wants to merge 21 commits intoscala:mainfrom
dotty-staging:solal/jvm-cleanup
Open

Clean up JVM backend#25618
SolalPirelli wants to merge 21 commits intoscala:mainfrom
dotty-staging:solal/jvm-cleanup

Conversation

@SolalPirelli
Copy link
Copy Markdown
Contributor

@SolalPirelli SolalPirelli commented Mar 25, 2026

Part of #25218

Review commit by commit

There's still more to do, PostProcessorFrontendAccess should disappear at the very least, but this is already nice

How much have you relied on LLM-based tools in this contribution?

not

How was the solution tested?

existing tests


def isInterface(using Context): Boolean = sym.is(PureInterface) || sym.is(Trait)

def isStaticConstructor(using Context): Boolean = (sym.isStaticMember && sym.isClassConstructor) || (sym.name eq nme.STATIC_CONSTRUCTOR)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one was nonsensical, isClassConstructor checks the exact name so a symbol can't be both static and <init>

@SolalPirelli SolalPirelli requested a review from lrytz March 25, 2026 14:13
@SolalPirelli SolalPirelli marked this pull request as ready for review March 25, 2026 14:16

object DottyBackendInterface {
given symExtensions: AnyRef with
extension (sym: Symbol)

def isInterface(using Context): Boolean = sym.is(PureInterface) || sym.is(Trait)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PureInterface implies Trait? Also for Java interfaces?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either that or we severely lack test coverage of "pure interfaces"

class ClosureOptimizer(ppa: PostProcessorFrontendAccess, backendUtils: BackendUtils,
byteCodeRepository: BCodeRepository, callGraph: CallGraph,
ts: CoreBTypes, bTypesFromClassfile: BTypesFromClassfile) {
ts: CoreBTypes, bTypesFromClassfile: BTypesFromClassfile)(using ctx: Context) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context is a beast. In the compiler frontend we are of course passing it around everywhere, but the optimizer was built to be independent of the frontend. The necessary dependencies would be explicit with PostProcessorFrontendAccess.

IIRC

  • one goal was to keep the door open to pull out the code into a compiler-independent bytecode manipulation tool.
  • another goal was to prevent deadlocks / race conditions by accidentally accessing the symbol table when running local optimizations and/or classfile writing in parallel (frontendLock)

I didn't realize while reviewing the optimizer, but Context is already used in opt/.

I didn't look at it in detail now, but do you have a clear picture of the current situation, how the backend/optimizer depends on the frontend data structures? Do we still intend to keep the backend's dependency on the frontend explicit / synchronized?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I heard was that the separation was intended to make it easier to port the Scala 2 backend to Scala 3 by minimizing dependencies on the frontend, so the same-ish backend could work for both with a small abstraction layer on top of the frontend, and now that the port is done we could remove these abstractions.

There are so many references to trees everywhere that "compiler-independent bytecode manipulation tool" doesn't seem super realistic to me 🤔

Is there documentation somewhere on which things are / aren't allowed to be done with a single Context across threads?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are trees referenced in the optimizer?

The reason for PostProcessorFrontendAccess the was parallelization, I don't think we were thinking about dotty back then. scala/scala#6012.

I believe the symbol table is inherently not thread-safe, info completers for example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no I meant the backend in general references trees. I now realize the fact that the backend was ported without the optimizer but with infrastructure intended for the optimizer like PostProcessor and PostProcessorFrontendAccess made it harder to understand what the architecture was...

I only added the context for reporting stuff; I can keep the "delete BackendReporting" part while still keeping the Context wrapped inside PPFA.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with Seb offline:

  • the entire context should be treated as needing sync
    • which is why the backend can't really be multithreaded before classfile writing, the overhead of syncing would be too much
  • "per run lazy" stuff is an artifact of Scala 2 in which phases aren't re-created per run

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend runs local optimizations in parallel with -Ybackend-parallelism. PostProcessor already has using Context, so there's a risk that the context ends up being referenced by accident in sendToDisk (which runs in parallel). The switch to dotty.tools.dotc.report in this PR is actually an example for that.

At least LocalOpt doesn't have a Context.

Adding Context to ClosureOptimizer is probably fine since global optimizations are single-threaded.

But adding Context to AsyncWritingClassHandler is a bit of a red flag I think.

PerRunLazy is for lazily initialized state that's cleared on new compiler runs. In Scala 2, a Global instance can stay alive for a long time with new Run instances being used for repeated compilations. This is used in the REPL, for example. The symbol table is reset to an initial state (adaptToNewRunMap), compiler settings can change between runs. PerRunLazy makes sure state that depends on settings or the symbol table is re-initialized on new runs. I don't know if that is not needed in Scala 3.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that PerRunLazy currently doesn't do anything "per run", all it does is the "lazy" part + synchronized the initialization using the frontendLock. I somehow missed that when reading the code.

However removing it does cause crashes for parallel compilation and optimization, so there's something in the backend that depends on lazy initializations being serialized rather than allowed to run in parallel... looking into what.

@SolalPirelli
Copy link
Copy Markdown
Contributor Author

@lrytz I removed all references to Context from opt/. This required undoing the lazy loading of InlineInfo, since for symbol-based ones we need a Context, so now it's back to eager loading with some references to opt stuff outside of opt.

I still want to remove perRunLazy at some point but this PR is big enough as it is...

Copy link
Copy Markdown
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice cleanups!

Could summarize in a comment somewhere (PostProcessorFrontendAccess i guess) that the goal is to prevent concurrent access to the frontend data structures?

The fact that perRunLazy is not actually re-initializing per run is probably a bug? IIUC Scala 3 has the same setup; for a single dotc.Compiler instance there can be multiple Runs (def newRun). The symbol table may be different from one run to the next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants