Skip to content

Fix short data CV#70

Open
MichalChromcak wants to merge 12 commits intoheidelbergcement:masterfrom
MichalChromcak:fix_short_data_cv
Open

Fix short data CV#70
MichalChromcak wants to merge 12 commits intoheidelbergcement:masterfrom
MichalChromcak:fix_short_data_cv

Conversation

@MichalChromcak
Copy link
Collaborator

@MichalChromcak MichalChromcak commented Apr 1, 2022

In certain cases, the current behaviour of the scorer stores the cv_data wrongly (see the split numbers). Further plotting and evaluation functionality is partially affected by that.

This MR aims to fix this.

Example:

import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.DataFrame(
    {"target":list(range(7))}, 
    index=pd.date_range(start="2021-03-31", end="2021-09-30", freq="M")
)
target
2021-03-310
2021-04-301
2021-05-312
2021-06-303
2021-07-314
2021-08-315
2021-09-306
ms = ModelSelector(
    horizon=1,
    frequency="M",
)
ms.create_gridsearch(
    sklearn_models=False,
    n_splits=4,
    between_split_lag=None,
    sklearn_models_optimize_for_horizon=False,
    autosarimax_models=False,
    prophet_models=False,
    tbats_models=False,
    exp_smooth_models=False,
    average_ensembles=False,
    stacking_ensembles=False,
)
ms.add_model_to_gridsearch(get_sklearn_wrapper(LinearRegression, name="linreg_3", lags=3))
ms.add_model_to_gridsearch(get_sklearn_wrapper(LinearRegression, name="linreg_1", lags=1))

ms.select_model(
    df=df,
    target_col_name="target",
)

print(ms.results[0].cv_data)
splity_trueb80cee186b053880a84ec8d7c4692365e474b1ddba8a0a6f849b49abf903a4e3
2021-07-3104.03.03.0
2021-08-3115.05.04.0
2021-09-3026.06.05.0
2021-06-3003.0NaN3.0

@MichalChromcak MichalChromcak self-assigned this Apr 1, 2022
@MichalChromcak
Copy link
Collaborator Author

@pavelkrizek FYI

@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2022

Codecov Report

Merging #70 (24bbfb3) into master (11166bd) will decrease coverage by 0.06%.
The diff coverage is 90.24%.

@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
- Coverage   93.79%   93.73%   -0.07%     
==========================================
  Files          56       56              
  Lines        2853     2888      +35     
==========================================
+ Hits         2676     2707      +31     
- Misses        177      181       +4     
Impacted Files Coverage Δ
src/hcrystalball/wrappers/_sklearn.py 94.80% <ø> (ø)
src/hcrystalball/metrics/_scorer.py 90.66% <78.94%> (-4.50%) ⬇️
tests/unit/metrics/test_scorer.py 93.93% <100.00%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eeb32b9...24bbfb3. Read the comment docs.

@pavelkrizek
Copy link
Contributor

@MichalChromcak Thanks for catching and fixing the bug! Everything is good from my side, just the function results_to_cv_data is quite complex and it's hard to see what exactly is happening there, so a more descriptive docstring would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants