What did you realize when you tried to submit your predictions? What changes were needed to the output of the predictor to submit your results?
-
When running the first prediction with AutoGluon’s default model and attempting to submit the
submission.csvfile to Kaggle, we observed two main issues:-
Column format: Kaggle requires that the submission CSV contain exactly two columns with the headers
datetimeandcount. By default, our prediction output was an array of values without column labels, so it was necessary to load thesampleSubmission.csvfile and assign the predictor’s predictions intosubmission['count']. -
Negative values: Reviewing the competition documentation revealed that Kaggle rejects submissions containing negative values in the
countcolumn. AutoGluon’s predictor sometimes produces very small negative values (close to zero) due to model residuals. Therefore, before saving the finalsubmission.csv, we applied:predictions = predictor.predict(test_data) predictions[predictions < 0] = 0 submission = pd.read_csv('sampleSubmission.csv', parse_dates=['datetime']) submission['count'] = predictions submission.to_csv('submission.csv', index=False)
-
Kaggle API key warning: When running
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "first raw submission"a warning appeared stating:
Warning: Your Kaggle API key is readable by other users. It is recommended to set its permissions to 600.This did not block the submission, but it is good practice to adjust file permissions with:
chmod 600 ~/.config/kaggle/kaggle.json
- Outcome: With these adjustments (labeling the
countcolumn, forcing values ≥ 0, and preserving the correctdatetimeformat), the first submission was accepted by Kaggle without format errors.
-
What was the top ranked model that performed?
-
For the initial phase we used:
predictor = TabularPredictor( label='count', problem_type='regression', eval_metric='root_mean_squared_error' ).fit( train_data=train.drop(['casual','registered'], axis=1), time_limit=600, presets='best_quality' )
-
After training (≈10 minutes), AutoGluon produced an internal leaderboard where the model with the best OOF (RMSE) was WeightedEnsemble_L2, which achieved an OOF RMSE of 1.80145.
What did the exploratory analysis find and how did you add additional features?
-
After loading
train.csvand plotting histograms for all variables (train.plot.hist()), we observed that:-
The demand (
count) exhibits very pronounced hourly seasonality: there are clear peaks during rush hours (morning and evening) and troughs during low-activity periods (overnight). -
The weather-related variables (
temp,humidity,windspeed) have moderate ranges, but we identified:- Positive correlation between
tempandcount(higher temperature generally → higher usage). - Slight negative correlation between
humidityandcount(higher humidity → lower demand).
- Positive correlation between
-
-
To capture hourly seasonality, we created a new
hourfeature extracted from thedatetimecolumn:train['hour'] = train['datetime'].dt.hour test['hour'] = test['datetime'].dt.hour
-
We also converted the
seasonandweathercolumns to thecategorydata type, since although they are encoded numerically, they represent discrete categories that AutoGluon’s algorithms handle better as categorical variables:train['season'] = train['season'].astype('category') train['weather'] = train['weather'].astype('category') test['season'] = test['season'].astype('category') test['weather'] = test['weather'].astype('category')
-
These changes allowed AutoGluon’s internal models (LightGBM, CatBoost, etc.) to capture nonlinear patterns related to season and weather, in addition to hourly seasonality.
How much better did your model preform after adding additional features and why do you think that is?
-
Using the same training configuration as in the initial phase but including the new
hourcolumn (and convertingseasonandweatherto categorical), we retrained:predictor_new_features = TabularPredictor( label='count', problem_type='regression', eval_metric='root_mean_squared_error' ).fit( train_data=train.drop(['casual','registered'], axis=1), time_limit=600, presets='best_quality' )
-
The new model achieved an OOF RMSE of 0.62399, compared to 1.80145 for the version without the
hourfeature. -
Reason for the improvement:
- Bike demand clearly follows a strong daily cycle. By explicitly including the
hourfeature, the internal models could “learn” the peaks and valleys corresponding to different times of day. - Marking
seasonandweatheras categorical reduced biases that could arise if the model misinterpreted those numerical encodings as continuous variables. LightGBM and CatBoost handle categorical variables more effectively, capturing interactions between season/climate and demand.
- Bike demand clearly follows a strong daily cycle. By explicitly including the
How much better did your model preform after trying different hyper parameters?
-
After the second experiment (with feature engineering), the OOF RMSE was 0.62399.
-
By incorporating a hyperparameter tuning step in AutoGluon, we defined a search dictionary for LightGBM parameters, for example:
hyperparameters = { 'GBM': { 'learning_rate': ag.space.Real(1e-3, 0.1, log=True), 'num_leaves': ag.space.Int(20, 150), 'subsample': ag.space.Real(0.5, 1.0), 'feature_fraction': ag.space.Real(0.5, 1.0) } } predictor_tuned = TabularPredictor( label='count', problem_type='regression', eval_metric='root_mean_squared_error' ).fit( train_data=train.drop(['casual','registered'], axis=1), hyperparameter_tune_kwargs={'num_trials': 20, 'max_reward': 0.01}, time_limit=900, presets='best_quality' )
-
With these settings, the best LightGBM run achieved an OOF RMSE of 0.46993.
-
Absolute improvement: RMSE reduction of 0.62399 − 0.46993 = 0.15406 points.
-
Relative improvement: approximately 24.7% reduction in RMSE compared to the model without tuning.
-
Explanation:
- Adjusting
learning_rateandnum_leavesallowed us to balance tree complexity better: with a lower learning rate and an optimal number of leaves, the trees learned more gradually, avoiding premature overfitting. - Parameters like
subsampleandfeature_fractionintroduced additional randomness, improving generalization and reducing variance on the validation set.
- Adjusting
-
If you were given more time with this dataset, where do you think you would spend more time?
-
Additional temporal feature engineering:
- Day of the week: Create a column like
weekday = datetime.dt.dayofweekand potentially one-hot encode each day, since usage patterns on weekdays vs. weekends often differ. - Month of the year: Extract
month = datetime.dt.monthto capture annual seasonality (e.g., summer vs. winter). - Local holiday indicators: Cross-reference with a local calendar to generate a boolean column
is_holiday, since demand can spike or drop dramatically on holidays.
- Day of the week: Create a column like
-
Feature interactions and advanced weather features:
- Interaction
temp × humidity: Sometimes the “feels like” temperature or heat index is more predictive than each variable individually. - Weather condition clusters: Group
weatherinto “good weather” vs. “bad weather” clusters using a quick clustering method (e.g., KMeans ontemp,humidity,windspeed). - Long-term trend features: Smooth demand with rolling windows (24 h or 7 days) to capture overall demand trends, then use the difference between the current value and that moving average as a feature.
- Interaction
-
Improved validation strategy:
- Implement a
TimeSeriesSplitor time-blocked validation so that the model never “sees” future data during validation, better mimicking a production environment. - Test rolling-origin evaluation (train on data up to a point, validate on the next window, and iterate). This is crucial for time-series where chronological order matters.
- Implement a
-
Models specialized in time series:
- Explore
autogluon.timeseriesand compare its performance to the tabular approach. - Test sequential models like LSTM/GRU using libraries such as PyTorch or TensorFlow (outside of AutoGluon) to see if they can capture complex temporal patterns more effectively.
- Explore
-
More hyperparameter tuning and custom ensembles:
- Expand the search space to include CatBoost and XGBoost parameters as well.
- Create custom stacking ensembles (for example, combining multiple base-model predictions with a simple meta-model like a linear regressor).
-
Training efficiency optimizations:
- Tune parallelism settings in LightGBM/CatBoost to speed up hyperparameter exploration in a cloud environment.
- Reduce temporal granularity (e.g., aggregate every 2 hours) and evaluate the impact on accuracy vs. training time.
- Use smart sampling techniques: validate on a representative subset of data to iterate quickly, then train on the full dataset.
-
In summary, with more time I would focus on advanced temporal feature engineering, better time-series-aware validation, and time-series-specific models, as I believe these areas hold the greatest potential for accuracy improvements without overfitting.
Create a table with the models you ran, the hyperparameters modified, and the kaggle score.
| model | hpo1 | hpo2 | hpo3 | score |
|---|---|---|---|---|
| initial | default_vals | default_vals | default_vals | 1.80145 |
| add_features | default_vals | default_vals | default_vals | 0.62399 |
| hpo | GBM: learning_rate: [1e-3 - 0.1], num_leaves: [31, 64, 128], colsample_bytree: [0.5 - 1.0], subsample: [0.5 - 1.0], lambda_l1: [0 - 1.0], lambda_l2: [0 - 1.0] | CAT: iterations: [200, 500, 1000], learning_rate: [1e-3 - 0.3], depth: [4, 6, 8, 10] | XGB: max_depth: [4, 6, 8, 10], learning_rate: [1e-3 - 0.2], subsample: [0.5 - 1.0], colsample_bytree: [0.5 - 1.0], n_estimators: [100, 300, 500] | 0.46993 |
Create a line plot showing the top model score for the three (or more) training runs during the project. Below is the comparison of OOF (RMSE) scores from the three successive training runs:
- initial (no feature engineering): RMSE = 1.80145
- add_features (with
hourfeature and categorical conversions): RMSE = 0.62399 - hpo (with hyperparameter tuning): RMSE = 0.46993
Create a line plot showing the top kaggle score for the three (or more) prediction submissions during the project.
Final overall summary In this project, we learned how AutoGluon greatly streamlines the regression workflow for a time-series problem (hourly bike-sharing demand):
-
Initial phase: We used AutoGluon’s default predictor and quickly discovered that, without explicit time-based features (hour, day, month), the model failed to capture seasonality, resulting in an OOF RMSE of 1.80145. We also verified that it was necessary to adjust the prediction output to avoid negative values and use the correct
datetimeformat to submit to Kaggle. -
Feature engineering: Through exploratory data analysis (histograms and correlations), we concluded that the
hourfeature was crucial. After adding it (and markingseasonandweatheras categorical), the OOF RMSE dropped to 0.62399, a substantial improvement. This highlights the importance of explicitly providing the model with seasonal indicators. -
Hyperparameter tuning: Finally, by defining a search space for LightGBM parameters (learning_rate, num_leaves, subsample, etc.) and passing
hyperparameter_tune_kwargsto AutoGluon, we reduced the OOF RMSE further to 0.46993. Fine-tuning hyperparameters helped refine the decision trees and reduce residual overfitting. -
Lessons learned:
-
Including derived time features (e.g., hour of day, day of week, month) can make a significant difference in strongly seasonal problems.
-
AutoGluon’s
best_qualitypreset automatically ensembles multiple models (bagged LightGBM, CatBoost, Random Forest, XGBoost, etc.) and performs internal stacking, giving a strong baseline without manually training each model. -
Manual hyperparameter tuning (especially for LightGBM) can yield additional gains: in our case, we reduced RMSE by roughly 25% compared to the “new features” phase.
-
Future improvements could include:
- More time-based features: day of week, week/weekend indicators, month, local holidays indicator (
is_holiday). - Advanced weather features: nonlinear effects, interactions between temperature and humidity, clustering of weather conditions.
- Explicit time-series models: trying
autogluon.timeseriesor LSTM/GRU models to directly capture temporal trends. - Time-aware cross-validation: using
TimeSeriesSplitor rolling-origin validation to better simulate production performance.
- More time-based features: day of week, week/weekend indicators, month, local holidays indicator (
-
-
Overall, the project demonstrated how to start with a simple AutoGluon pipeline, iterate on feature engineering, and then fine-tune hyperparameters to achieve competitive performance on Kaggle.
