Why are tasks and subsequences randomly picked for a Chronos-2 training dataset? #469
-
|
I have been using Firstly, a batch is constructed by randomly selecting tasks. chronos-forecasting/src/chronos/chronos2/dataset.py Lines 646 to 649 in f951d9a Then, the subsequences that go into the batch are selected randomly as well. chronos-forecasting/src/chronos/chronos2/dataset.py Lines 557 to 558 in f951d9a Could someone elaborate what the reasoning is behind using randomization in these places? Alternative approachAlternatively to the randomization, I would hypothesize that selecting an equal number of samples from each task would result in more reliable outcomes. Similarly, the subsequences could be spread out across the available timespan of each task, such that the training steps cover as much training data as possible. This would also prevent accidental overfitting due to bad luck in selecting training datapoints. Has such an alternative been considered? Is it deliberate choice to go with the current implementation that uses randomization? Or is it left as an exercise to the user of the chronos-forecast package to implement themselves? Thanks in advance for any help on the matter! PS: I am more than happy to contribute this alternative method for selecting training samples, if it seems useful :) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
@GuidoHeijden The time series slicing logic is based on our prior experience working with forecasting models and developing open source libraries like GluonTS and AutoGluon. That said, there are always trade-offs and this may not be the best possible setup for all situations. However, in our large scale benchmarking it works well across tasks. |
Beta Was this translation helpful? Give feedback.
-
|
Does your alternative work on your own dataset? |
Beta Was this translation helpful? Give feedback.
@GuidoHeijden The time series slicing logic is based on our prior experience working with forecasting models and developing open source libraries like GluonTS and AutoGluon. That said, there are always trade-offs and this may not be the best possible setup for all situations. However, in our large scale benchmarking it works well across tasks.