Replaced LocalUniqueCache with .forEach methods#14724
Replaced LocalUniqueCache with .forEach methods#14724Ambeco wants to merge 1 commit intoyairm210:masterfrom
Conversation
|
Notably, this is built on top of #14714 |
|
Sounds intriguing. Why would yieldAll be notably expensive...? |
|
Honestly, I was expecting like a 0.5% improvement, and was also shocked.
But some of my testing indicated the problem wasn't the iterators, or even the sequences. The problem seems to be the coroutines. Further testing is needed to confirm. But the improvement is still stunning. I'll probably remeasure. I think I uninstalled Facebook and Discord from my test phone between the control and test. It's possible that had an unexpected affect on the performance? |
568eb51 to
4456e76
Compare
|
Further testing shows that measurements are problematic.
And this time, GameInfo.nextTurn only 4% faster, which wasn't statistically significant. |
Thanks for the detailed answer. But that's close to my point - in a simple environment, yieldAll can simply wrap the existing iterator to pass elements through. Running inside a coroutine shouldn't really influence that. Oh, don't forget CoroutineContext.yield is an entirely different beast than SequenceBuilder.yield - no relation at all. They "meet" in a Flow, but luckily there it's renamed to emit... But what I could imagine - is costly closures. As you know, using data from an outer scope inside e.g. a lambda creates a closure, but when that outer scope is not even known to be in the same CoroutineContext, then I imagine things get expensive. Maybe. Back when I was active I had taught myself to look properly, see the closures (the debugger sometimes helps too), and check whether one can reduce them to simpler, fewer or cheaper ones. |
|
You say the yields are entirely unrelated, but consider I also considered the closures, but my measurements weren't actually showing many closures. There's a few here and there, but not NEARLY so many as I had expected to find. Most of the memory churn (by bytes) was Sequences and Sequence Iterators, Array Iterators, HashMap Iterators, Object[], Long[], and HashMap.Entry. There's definitely a handful of closures, but not very many. I have a separate WIP that cleans up the most offensive closures. |
Of course, agreed, but... "pedantic debates" so far this wasn't, at least I thought comparing our views on mechanics is a kind of fun. Not as much as downhill biking through deep mud, but still...
Yup, mixed that up in part because the source for
Not convinced, but I'll file that away (gears grinding)
Excellent.
Even excellenter. |
|
Oh right I forgot that updating the commit description doesn't update here. I redid the timings: GCs/turn dropped from 11.13 to 6.31, a 56% reduction. Timing changes:
Not stat sig:
Notes:
|
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
Okay, this looks like a lot of changes, but the premise and the results are both good, and I trust your work enough at this point that I don't feel the need to go through everything with a comb :) |
|
I will wait to merge this in the next version so we can get #14714 out to not include too much change in one version |
|
Conflicts have been resolved. |
|
I've been contemplating the doc Yairm wrote to prefer for loops for .forEach. Maybe we should hold off on merging this until I test that theory. Maybe we don't have to deprecate the existing methods. |
These avoid allocating a coroutine, sequence, and iterator. They avoid the expensive yieldAll code, improving performance, and reducing memory allocations, which especially helps on low-memory devices. with .neighbors. GCs/turn dropped from 11.13 to 6.31, a 56% reduction. Timing changes: - automateUnitMoves 23% faster - CityTurnManager.endTurn 21% faster - CityStats.update 18% faster - reassignPopulation 17% faster - ConstructionAutomation.choosenext 15% faster - updateTileStats 15% faster - CityPopManager.autoAssignPop 14% faster - GameInfo.updateCivState 14% faster - CityTurnManager.startTurn 12% faster - moveToTile 8% faster - MapUnit.updateVisibleTiles 10% slower (???) Not stat sig: - GameInfo.nextTurn 4% faster Notes: - Control called GameInfo.nextTurn 36 times after loading the file, and treatment only 26 times, so the games apparently differed wildly :( - automateUnitMoves called 461 times in Control, 836 times in treatment. - MapUnit.updateVisibleTiles stderr increased from 16->29, so became much noisier. Did some GCs become moved inside this? https://docs.google.com/spreadsheets/d/1r-TC1N3wKhZMRJGU-ph5Fcjfu1u31MHfjFx6xM693Ss/edit?usp=sharing Signed-off-by: MPD <mooing_psycho_duck@hotmail.com>
|
Curious. I made a variant where I replaced the 119% more GCs, and most methods were slower, often by ~20%. Though UnitTurnManager.startTurn was ~15% faster. GameInfo.nextTurn was 8.8% slower, though not stat-sig. So apparently the savings isn't the coroutines. Maybe its simply the reduced allocations? But that means we're clear to merge this, so that's nice. |

Replaced LocalUniqueCache with .forEach methods.
These avoid allocating a coroutine, sequence, and iterator. They avoid the expensive yieldAll code, improving performance, and reducing memory allocations, which especially helps on low-memory devices.
with .neighbors. GameInfo.nextTurn now 11.09% faster on my Pixel3a.
Timing changes:
https://docs.google.com/spreadsheets/d/1r-TC1N3wKhZMRJGU-ph5Fcjfu1u31MHfjFx6xM693Ss/edit?usp=sharing
Signed-off-by: MPD mooing_psycho_duck@hotmail.com