-
Notifications
You must be signed in to change notification settings - Fork 15
Fix 1770, O3mda8 model maps, selectively for countour maps #1774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ars list, i.e. has been computed in the map engine, fix obs_name in the onlymap case for cams283 where first_with_mod_name[0] gives the key name which is EEA while we want EEA-UTD
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main-dev #1774 +/- ##
============================================
+ Coverage 78.35% 78.50% +0.15%
============================================
Files 176 176
Lines 23414 23445 +31
============================================
+ Hits 18346 18406 +60
+ Misses 5068 5039 -29
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@dulte @heikoklein this approach makes it possible to compute the maps for a single full season, which is what Svetlana needs if I understood correctly, but for 11 seasons it takes close to 40h so I would not use the flag in that case... if you see a smarter way of implementing it let me know |
heikoklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any better ideas, but I don't understand why the mda8 calculation of 3d data should take 40hours for a year? Maybe you have a simple test-case where we can check performance?
Possible pitfalls might be that the dataset is computed several times? An hourly dataset will be too much for memory, while a daily dataset should work fine, so maybe try to compute before moving back to iris?
Maybe try some dask parallellization? (also easier to test with a simpler test-case)
| data2.rolling(time=8, center=False, min_periods=6) | ||
| .mean("time") | ||
| .resample(time="24h", origin="start_day", label="left", offset="1h") | ||
| .reduce(lambda x, axis: np.apply_along_axis(min_periods_max, 0, x, min_periods=18)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the code in stats/mda8/mda8.py easier readable:
mda8 = _daily_max(_rolling_average_8hr(data))
Maybe you can reuse that part? (and/or put it it there)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot be reused directly, at least the _daily_max part because of different structure of the object handled, and so the axis to apply the lambda function to is different (if I got it right...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, mean("time") <-> mean() and np.apply_along_axis(min_periods_max, 0, x, min_periods=18) <-> np.apply_along_axis(min_periods_max, 1, x, min_periods=18)
What I like in
pyaerocom/pyaerocom/stats/mda8/mda8.py
Line 91 in 4c3bda6
| mda8 = _daily_max(_rolling_average_8hr(data)) |
mda8_3d and rolling_average_8h_3d would make the code clearer and would show, that you tried to implement exactly the same mda8 as we do for collocated data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, this we can do , I will refactor the code
40h is the total runtime for all models and 11 season which is almost 3 years... |
If all data is read hourly, and all data is resampled to daily plots, then 6 times the time just for a rolling average resampling sounds a lot (pure gutt-feeling, no proof yet). Or is conco3 also plottet as other things (o3dailymax/o3dailymean)? Do you have an example input-file? |
the rolling average alone is quite fast, based on my tests, it's the rest that is slow... (note that this will not produce an experiment output valid for aeroval) |
… the case of griddeddata, skip it
heikoklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements in readability.
| VariableDefinitionError, | ||
| VarNotAvailableError, | ||
| ) | ||
| from pyaerocom.stats.mda8.mda8 import _calc_mda8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, functions starting with _ should not be considered private and not be imported (except in tests). Consider turning _calc_mda8 to calc_mda8 to make it officially public.
Co-authored-by: Heiko Klein <Heiko.Klein@met.no>
|
yes indeed, so far we have just moved things around, performance has not been improved (was actually made worse before 164e8f3 ) |
This reverts commit 4e9b3c2.
…ase only_model_map=True and compute_conco3mda8_contours=True
…for conco3 by copying it, fix bug in the case uri.meta['obsvar'] == 'conco3mda8': continue instead of break, add test
…se is valid, i.e. is such that all the tabs work in aeroval
|
note for cams2_83: menu.json is now written compatibly with a non-only-map experiment.. it will need to be rsynced from the only-map experiments too on the top of the contour folder |
|
@heikoklein if no objections I will merge this, we can consider performance improvement if there is a need at some point.. |





Change Summary
calculation of countour model maps for conco3mda8 in the map engine
Related issue number
fixes #1770
Checklist