Replaced LocalUniqueCache with .forEach methods by Ambeco · Pull Request #14724 · yairm210/Unciv

Ambeco · 2026-04-05T05:57:45Z

Replaced LocalUniqueCache with .forEach methods.

These avoid allocating a coroutine, sequence, and iterator. They avoid the expensive yieldAll code, improving performance, and reducing memory allocations, which especially helps on low-memory devices.
with .neighbors. GameInfo.nextTurn now 11.09% faster on my Pixel3a.

Timing changes:

CityPopulationManager.autoAssignPopulation 22.68% faster
CityTurnManager.startTurn 22.08% faster
GameInfo.updateCivilizationState 20.73% faster
reassignPopulation 20.39% faster
Civilization.updateStatsForNextTurn 16.65% faster
CityStats.update 14.06% faster
GameInfo.nextTurn 11.09% faster
Civilization.setTransients 11.04% faster
updateTileStats 10.21% faster
moveToTile 9.98% faster
UnitTurnManager.startTurn 3.41% faster.
automateUnitMoves 14.85% SLOWER (?)

https://docs.google.com/spreadsheets/d/1r-TC1N3wKhZMRJGU-ph5Fcjfu1u31MHfjFx6xM693Ss/edit?usp=sharing

Signed-off-by: MPD mooing_psycho_duck@hotmail.com

Ambeco · 2026-04-05T05:58:09Z

Notably, this is built on top of #14714

SomeTroglodyte · 2026-04-05T20:32:05Z

Sounds intriguing. Why would yieldAll be notably expensive...?

Ambeco · 2026-04-05T22:27:19Z

Honestly, I was expecting like a 0.5% improvement, and was also shocked.
My suspicious are:

allocating a sequencing coroutine allocated a massive amount of memory for the call stack, which is expensive on memory constrained devices.
the sequence + yield all logic tends to be nested, meaning we actually allocate multiple sequencing callstacks.
yieldAll also has to allocate lists of things, which means the iterators have the overhead of iterating over internal lists and switching to the coroutine if empty, and again, these are often nested. The app was spending ~2-4% of the total count time in iterator.hasNext and iterator.next due to this. (I expected a total of 2-4% improvement from removing all sequencing coroutines)
Ive replaced these with for-loops, sometimes with an explicit filter parameter, because this eliminates all allocations. The code is not only no longer allocating coroutines, it's now not even allocating iterators or filters.

But some of my testing indicated the problem wasn't the iterators, or even the sequences. The problem seems to be the coroutines. Further testing is needed to confirm.

But the improvement is still stunning. I'll probably remeasure. I think I uninstalled Facebook and Discord from my test phone between the control and test. It's possible that had an unexpected affect on the performance?

Ambeco · 2026-04-06T18:44:45Z

Further testing shows that measurements are problematic.

Control called GameInfo.nextTurn 36 times after loading the file, and treatment only 26 times, so the games apparently differed wildly :(
automateUnitMoves called 461 times in Control, 836 times in treatment.

And this time, GameInfo.nextTurn only 4% faster, which wasn't statistically significant.
However, GC data is now working, and we have 50% fewer GCs per turn, which is nice.

SomeTroglodyte · 2026-04-07T15:08:21Z

... yieldAll also has to allocate lists of things ...

Thanks for the detailed answer. But that's close to my point - in a simple environment, yieldAll can simply wrap the existing iterator to pass elements through. Running inside a coroutine shouldn't really influence that. Oh, don't forget CoroutineContext.yield is an entirely different beast than SequenceBuilder.yield - no relation at all. They "meet" in a Flow, but luckily there it's renamed to emit...

But what I could imagine - is costly closures. As you know, using data from an outer scope inside e.g. a lambda creates a closure, but when that outer scope is not even known to be in the same CoroutineContext, then I imagine things get expensive. Maybe. Back when I was active I had taught myself to look properly, see the closures (the debugger sometimes helps too), and check whether one can reduce them to simpler, fewer or cheaper ones.

Ambeco · 2026-04-08T14:20:08Z

You say the yields are entirely unrelated, but consider sequence { while(true) yield(0) }. The ONLY way to implement SequenceScope.yield is via CoroutineScope.yield. The two share the same name because they are, indeed, the same thing. Additionally, it's SequenceScope, not SequenceBuilder, because it's the coroutine scope thats yielding, not just some method. As much as I love pedantic debates though, I think this is not key to the PR.

I also considered the closures, but my measurements weren't actually showing many closures. There's a few here and there, but not NEARLY so many as I had expected to find. Most of the memory churn (by bytes) was Sequences and Sequence Iterators, Array Iterators, HashMap Iterators, Object[], Long[], and HashMap.Entry. There's definitely a handful of closures, but not very many. I have a separate WIP that cleans up the most offensive closures.

SomeTroglodyte · 2026-04-08T18:35:26Z

I think this is not key to the PR

Of course, agreed, but... "pedantic debates" so far this wasn't, at least I thought comparing our views on mechanics is a kind of fun. Not as much as downhill biking through deep mud, but still...

it's SequenceScope, not SequenceBuilder,

Yup, mixed that up in part because the source for public abstract class SequenceScope is in a file named SequenceBuilder.kt

The ONLY way

Not convinced, but I'll file that away (gears grinding)

closures... not NEARLY so many as I had expected...

Excellent.

...WIP...

Even excellenter.

Ambeco · 2026-04-08T23:38:02Z

(allocation sample from head, without this change)
Looks like when I sort by bytes allocated in this sample, then the worst offending closure is the 11th item.
@ byte[] 4962K
@ FlatteningSequence (kotlin.sequences) 4289K
@ ArrayList$Itr (java.util) 3573K
@ FlatteningSequence$iterator$1 (kotlin.sequences) 3594K
@ String (java.lang) 3214K
@ FilteringSequence (kotlin.sequences) 2401K
@ Object[] (java.lang) 1831K
@ FilteringSequence$iterator$1 (kotlin.sequences) 1579K
@ Stats (com.unciv.models.stats) 1381K
@ HashMap$Node (java.util) 1176K
@ Civilization$getMatchingUniques$1 (com.unciv.logic.civilization) 1076K

FlatteningSequence and FlatteningSequence$iterator (aka yieldAll) allocate more than 7 times as many bytes as the worst closure :(
And FilteringSequence and FilteringSequence$iterator$1 aren't far behind FlatteningSequence.

To be clear, I love pedantic debates 😄 But the relation between SequenceScope.yield and CoroutineContext.yield isn't really much of a factor in if this PR should be accepted or not.

Ambeco · 2026-04-08T23:39:43Z

Oh right I forgot that updating the commit description doesn't update here. I redid the timings:

GCs/turn dropped from 11.13 to 6.31, a 56% reduction.

Timing changes:

automateUnitMoves 23% faster
CityTurnManager.endTurn 21% faster
CityStats.update 18% faster
reassignPopulation 17% faster
ConstructionAutomation.choosenext 15% faster
updateTileStats 15% faster
CityPopManager.autoAssignPop 14% faster
GameInfo.updateCivState 14% faster
CityTurnManager.startTurn 12% faster
moveToTile 8% faster
MapUnit.updateVisibleTiles 10% slower (???)

Not stat sig:

GameInfo.nextTurn 4% faster

Notes:

Control called GameInfo.nextTurn 36 times after loading the file, and treatment only 26 times, so the games apparently differed wildly :( Removing some unseeded randomness #14737 should help reduce the differences, somewhat.
automateUnitMoves called 461 times in Control, 836 times in treatment.
MapUnit.updateVisibleTiles stderr increased from 16->29, so became much noisier. Did some GCs become moved inside this?

github-actions · 2026-04-22T08:11:54Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

yairm210 · 2026-04-22T08:44:42Z

Okay, this looks like a lot of changes, but the premise and the results are both good, and I trust your work enough at this point that I don't feel the need to go through everything with a comb :)

yairm210 · 2026-04-22T08:46:37Z

I will wait to merge this in the next version so we can get #14714 out to not include too much change in one version

github-actions · 2026-04-28T16:18:58Z

Conflicts have been resolved.

Ambeco · 2026-04-28T16:41:59Z

I've been contemplating the doc Yairm wrote to prefer for loops for .forEach.
It occurs to me that if my theory is right, then we don't actually need new methods, we just need to replace the kotlin.sequence.sequence coroutine method with any other sequence type. MutableList.asSequence should work fine.

Maybe we should hold off on merging this until I test that theory. Maybe we don't have to deprecate the existing methods.

These avoid allocating a coroutine, sequence, and iterator. They avoid the expensive yieldAll code, improving performance, and reducing memory allocations, which especially helps on low-memory devices. with .neighbors. GCs/turn dropped from 11.13 to 6.31, a 56% reduction. Timing changes: - automateUnitMoves 23% faster - CityTurnManager.endTurn 21% faster - CityStats.update 18% faster - reassignPopulation 17% faster - ConstructionAutomation.choosenext 15% faster - updateTileStats 15% faster - CityPopManager.autoAssignPop 14% faster - GameInfo.updateCivState 14% faster - CityTurnManager.startTurn 12% faster - moveToTile 8% faster - MapUnit.updateVisibleTiles 10% slower (???) Not stat sig: - GameInfo.nextTurn 4% faster Notes: - Control called GameInfo.nextTurn 36 times after loading the file, and treatment only 26 times, so the games apparently differed wildly :( - automateUnitMoves called 461 times in Control, 836 times in treatment. - MapUnit.updateVisibleTiles stderr increased from 16->29, so became much noisier. Did some GCs become moved inside this? https://docs.google.com/spreadsheets/d/1r-TC1N3wKhZMRJGU-ph5Fcjfu1u31MHfjFx6xM693Ss/edit?usp=sharing Signed-off-by: MPD <mooing_psycho_duck@hotmail.com>

Ambeco · 2026-04-29T02:27:07Z

Curious. I made a variant where I replaced the sequence methods with a listSequence which evaluates the lambda immediately into a list, so there's no coroutines, but it had unexpected results
https://docs.google.com/spreadsheets/d/1NpVPqQXOsSxrEKUp3K9F-xP6s25-B3XGX4kzuYIsHQw/edit?gid=1346917422#gid=1346917422

119% more GCs, and most methods were slower, often by ~20%. Though UnitTurnManager.startTurn was ~15% faster. GameInfo.nextTurn was 8.8% slower, though not stat-sig.

So apparently the savings isn't the coroutines. Maybe its simply the reduced allocations?

But that means we're clear to merge this, so that's nice.

Ambeco force-pushed the lowAllocStats branch 2 times, most recently from 568eb51 to 4456e76 Compare April 6, 2026 18:42

Ambeco force-pushed the lowAllocStats branch from 4456e76 to f1bb1fe Compare April 7, 2026 14:53

Ambeco force-pushed the lowAllocStats branch from f1bb1fe to 57f4267 Compare April 19, 2026 01:15

github-actions Bot added the Conflicts label Apr 22, 2026

yairm210 approved these changes Apr 22, 2026

View reviewed changes

Ambeco force-pushed the lowAllocStats branch from 57f4267 to 31519b8 Compare April 28, 2026 16:18

github-actions Bot removed the Conflicts label Apr 28, 2026

Ambeco force-pushed the lowAllocStats branch from 31519b8 to b15a4ed Compare April 28, 2026 16:56

Conversation

Ambeco commented Apr 5, 2026

Uh oh!

Ambeco commented Apr 5, 2026

Uh oh!

SomeTroglodyte commented Apr 5, 2026

Uh oh!

Ambeco commented Apr 5, 2026

Uh oh!

Ambeco commented Apr 6, 2026

Uh oh!

SomeTroglodyte commented Apr 7, 2026

Uh oh!

Ambeco commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SomeTroglodyte commented Apr 8, 2026

Uh oh!

Ambeco commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ambeco commented Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

yairm210 commented Apr 22, 2026

Uh oh!

yairm210 commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Ambeco commented Apr 28, 2026

Uh oh!

Ambeco commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ambeco commented Apr 8, 2026 •

edited

Loading

Ambeco commented Apr 8, 2026 •

edited

Loading

Ambeco commented Apr 29, 2026 •

edited

Loading