Skip to content

Conversation

@charlienegri
Copy link
Collaborator

@charlienegri charlienegri commented Dec 18, 2025

Change Summary

calculation of countour model maps for conco3mda8 in the map engine

Related issue number

fixes #1770

Checklist

  • Start with a draft-PR
  • The PR title is a good summary of the changes
  • PR is set to AeroTools and a tentative milestone
  • Documentation reflects the changes where applicable
  • Tests for the changes exist where applicable
  • Tests pass locally
  • Tests pass on CI
  • At least 1 reviewer is selected
  • Make PR ready to review

…ars list, i.e. has been computed in the map engine, fix obs_name in the onlymap case for cams283 where first_with_mod_name[0] gives the key name which is EEA while we want EEA-UTD
@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.50%. Comparing base (357b9b4) to head (603ef16).
⚠️ Report is 37 commits behind head on main-dev.

Additional details and impacted files
@@             Coverage Diff              @@
##           main-dev    #1774      +/-   ##
============================================
+ Coverage     78.35%   78.50%   +0.15%     
============================================
  Files           176      176              
  Lines         23414    23445      +31     
============================================
+ Hits          18346    18406      +60     
+ Misses         5068     5039      -29     
Flag Coverage Δ
unittests 78.50% <100.00%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This was referenced Dec 21, 2025
@charlienegri charlienegri self-assigned this Jan 5, 2026
@charlienegri charlienegri added enhancement ✨ New feature or request CAMS2_83 Issues related to the CAMS2_83 contract labels Jan 5, 2026
@charlienegri
Copy link
Collaborator Author

Screenshot from 2026-01-08 09-13-16

@charlienegri charlienegri added this to the m2026-02 milestone Jan 8, 2026
@charlienegri charlienegri changed the title Fix 1770 approach 2 Fix 1770 selectively for countour maps Jan 8, 2026
@charlienegri
Copy link
Collaborator Author

charlienegri commented Jan 8, 2026

@dulte @heikoklein this approach makes it possible to compute the maps for a single full season, which is what Svetlana needs if I understood correctly, but for 11 seasons it takes close to 40h so I would not use the flag in that case...

if you see a smarter way of implementing it let me know

@charlienegri charlienegri requested a review from dulte January 8, 2026 11:55
@heikoklein heikoklein changed the title Fix 1770 selectively for countour maps Fix 1770, O3mda8 model maps, selectively for countour maps Jan 15, 2026
Copy link
Member

@heikoklein heikoklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any better ideas, but I don't understand why the mda8 calculation of 3d data should take 40hours for a year? Maybe you have a simple test-case where we can check performance?

Possible pitfalls might be that the dataset is computed several times? An hourly dataset will be too much for memory, while a daily dataset should work fine, so maybe try to compute before moving back to iris?

Maybe try some dask parallellization? (also easier to test with a simpler test-case)

Comment on lines 51 to 54
data2.rolling(time=8, center=False, min_periods=6)
.mean("time")
.resample(time="24h", origin="start_day", label="left", offset="1h")
.reduce(lambda x, axis: np.apply_along_axis(min_periods_max, 0, x, min_periods=18))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the code in stats/mda8/mda8.py easier readable:
mda8 = _daily_max(_rolling_average_8hr(data))

Maybe you can reuse that part? (and/or put it it there)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot be reused directly, at least the _daily_max part because of different structure of the object handled, and so the axis to apply the lambda function to is different (if I got it right...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, mean("time") <-> mean() and np.apply_along_axis(min_periods_max, 0, x, min_periods=18) <-> np.apply_along_axis(min_periods_max, 1, x, min_periods=18)

What I like in

mda8 = _daily_max(_rolling_average_8hr(data))
is that the 2 map-reduce operations are split and somehow documented by function names. Putting your code in the same module as e.g. mda8_3d and rolling_average_8h_3d would make the code clearer and would show, that you tried to implement exactly the same mda8 as we do for collocated data.

Copy link
Collaborator Author

@charlienegri charlienegri Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, this we can do , I will refactor the code

@charlienegri
Copy link
Collaborator Author

charlienegri commented Jan 15, 2026

I don't have any better ideas, but I don't understand why the mda8 calculation of 3d data should take 40hours for a year? Maybe you have a simple test-case where we can check performance?

Possible pitfalls might be that the dataset is computed several times? An hourly dataset will be too much for memory, while a daily dataset should work fine, so maybe try to compute before moving back to iris?

Maybe try some dask parallellization? (also easier to test with a simpler test-case)

40h is the total runtime for all models and 11 season which is almost 3 years...
conco3mda calculation for a single model and 11 seasons (33 months) is something like a little under 2h if I remember correctly from that test..
I think I have leveraged the lazy loading as possible but maybe I am not seeing some easy optimization
I can further test performance with a simple case..
but anyway for 1 season is doable and for a single model and 1 year might also be doable... the code is stiil ugly tho,

@charlienegri
Copy link
Collaborator Author

I don't have any better ideas, but I don't understand why the mda8 calculation of 3d data should take 40hours for a year? Maybe you have a simple test-case where we can check performance?
Possible pitfalls might be that the dataset is computed several times? An hourly dataset will be too much for memory, while a daily dataset should work fine, so maybe try to compute before moving back to iris?
Maybe try some dask parallellization? (also easier to test with a simpler test-case)

40h is the total runtime for all models and 11 season which is almost 3 years... conco3mda calculation for a single model and 11 seasons (33 months) is something like a little under 2h if I remember correctly from that test.. I think I have leveraged the lazy loading as possible but maybe I am not seeing some easy optimization I can further test performance with a simple case.. but anyway for 1 season is doable and for a single model and 1 year might also be doable... the code is stiil ugly tho,

Screenshot from 2026-01-15 20-27-29

@heikoklein
Copy link
Member

40h is the total runtime for all models and 11 season which is almost 3 years... conco3mda calculation for a single model and 11 seasons (33 months) is something like a little under 2h if I remember correctly from that test.. I think I have leveraged the lazy loading as possible but maybe I am not seeing some easy optimization I can further test performance with a simple case.. but anyway for 1 season is doable and for a single model and 1 year might also be doable... the code is stiil ugly tho,

Screenshot from 2026-01-15 20-27-29

From this I see that the maps for a normal component takes ~20min, while mda8/conc03 takes ~115min. Considering 24x the amount of data which needs to be read, this looks pretty good. Or are all variables read hourly?

@charlienegri
Copy link
Collaborator Author

40h is the total runtime for all models and 11 season which is almost 3 years... conco3mda calculation for a single model and 11 seasons (33 months) is something like a little under 2h if I remember correctly from that test.. I think I have leveraged the lazy loading as possible but maybe I am not seeing some easy optimization I can further test performance with a simple case.. but anyway for 1 season is doable and for a single model and 1 year might also be doable... the code is stiil ugly tho,

Screenshot from 2026-01-15 20-27-29

From this I see that the maps for a normal component takes ~20min, while mda8/conc03 takes ~115min. Considering 24x the amount of data which needs to be read, this looks pretty good. Or are all variables read hourly?

all data is read hourly so there is no extra reading time at all, it's all purely computation alas..

@heikoklein
Copy link
Member

From this I see that the maps for a normal component takes ~20min, while mda8/conc03 takes ~115min. Considering 24x the amount of data which needs to be read, this looks pretty good. Or are all variables read hourly?

all data is read hourly so there is no extra reading time at all, it's all purely computation alas..

If all data is read hourly, and all data is resampled to daily plots, then 6 times the time just for a rolling average resampling sounds a lot (pure gutt-feeling, no proof yet). Or is conco3 also plottet as other things (o3dailymax/o3dailymean)? Do you have an example input-file?

@charlienegri
Copy link
Collaborator Author

charlienegri commented Jan 16, 2026

From this I see that the maps for a normal component takes ~20min, while mda8/conc03 takes ~115min. Considering 24x the amount of data which needs to be read, this looks pretty good. Or are all variables read hourly?

all data is read hourly so there is no extra reading time at all, it's all purely computation alas..

If all data is read hourly, and all data is resampled to daily plots, then 6 times the time just for a rolling average resampling sounds a lot (pure gutt-feeling, no proof yet). Or is conco3 also plottet as other things (o3dailymax/o3dailymean)? Do you have an example input-file?

the rolling average alone is quite fast, based on my tests, it's the rest that is slow...
a way to run a test is to use the cli with type season and whatever time window you want, example

cams2_83 forecast season 2025-11-01 2025-11-30 --model-path /lustre/storeB/project/fou/kl/CAMS2_83/model --obs-path /lustre/storeB/project/fou/kl/CAMS2_83/obs --data-path /lustre/storeB/users/heikok/something/data --coldata-path /lustre/storeB/users/heikok/something/coldata --cache /lustre/storeB/users/heikok/something/_cache_test --name 'TEST' --id test-conco3mda-contours --description 'test'  -p 2 --onlymap --conco3mda8contours

(note that this will not produce an experiment output valid for aeroval)

@charlienegri
Copy link
Collaborator Author

charlienegri commented Jan 16, 2026

I am testing the latest code with as much overlap as possible with the mda8 calculation already existing and it's even slower.... I think the reason is that the time dimension shift + filtering baked into _calc_mda8 are too expensive, even if they make sense also for the mapengine calculation...
Screenshot from 2026-01-16 15-01-50
for 1 season is ~ 9 min per model
I am not sure how much slower, may also be very little, I'll do some more testing and then refactor again if significant

Copy link
Member

@heikoklein heikoklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements in readability.

VariableDefinitionError,
VarNotAvailableError,
)
from pyaerocom.stats.mda8.mda8 import _calc_mda8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, functions starting with _ should not be considered private and not be imported (except in tests). Consider turning _calc_mda8 to calc_mda8 to make it officially public.

charlienegri and others added 2 commits January 17, 2026 15:50
@charlienegri
Copy link
Collaborator Author

timing with latest code for 11 seasons for 1 model is ~ 1.5h
Screenshot from 2026-01-18 18-00-28

@heikoklein
Copy link
Member

timing with latest code for 11 seasons for 1 model is ~ 1.5h
This is better than the 115min you had before, but also the concno2 numbers are reduced from 9min to 6min. Unless you have made considerable code-changes (last changes were more about readability), I would rather say the performance improvement come from a faster node?

@charlienegri
Copy link
Collaborator Author

charlienegri commented Jan 19, 2026

timing with latest code for 11 seasons for 1 model is ~ 1.5h
This is better than the 115min you had before, but also the concno2 numbers are reduced from 9min to 6min. Unless you have made considerable code-changes (last changes were more about readability), I would rather say the performance improvement come from a faster node?

yes indeed, so far we have just moved things around, performance has not been improved (was actually made worse before 164e8f3 )

@charlienegri
Copy link
Collaborator Author

charlienegri commented Jan 21, 2026

https://aeroval-test.met.no/charlien/pages/maps/?project=cams2-83&experiment=forecast-SON2025&parameter=conco3mda8

note for cams2_83: menu.json is now written compatibly with a non-only-map experiment.. it will need to be rsynced from the only-map experiments too on the top of the contour folder

@charlienegri charlienegri marked this pull request as ready for review January 21, 2026 11:39
@charlienegri
Copy link
Collaborator Author

@heikoklein if no objections I will merge this, we can consider performance improvement if there is a need at some point..

@charlienegri charlienegri merged commit f492a4e into main-dev Jan 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CAMS2_83 Issues related to the CAMS2_83 contract enhancement ✨ New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

O3mda8 model maps

2 participants