-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Dear Py-Boost developers,
Thanks for the very interesting paper and for making the code publicly available.
I am the author of XGBoostLSS and LightGBMLSS that extend the base implementations to a probabilistic setting, where all moments of a parametric univariate and multivariate distribution are modeled as functions of covariates. This allows one to create probabilistic predictions from which intervals and quantiles of interest can be derived.
However, as outlined in my latest paper März, Alexander (2022), Multi-Target XGBoostLSS Regression, XGBoost does not scale very well for the multivariate setting, since a separate tree is grown for each parameter individually. As an example, consider modelling a multivariate Gaussian distribution with D=100 target variables, where the covariance matrix is approximated using the Cholesky-Decomposition. Modelling all conditional moments (i.e., means, standard-deviations and all pairwise correlations) requires estimation of D(D + 3)/2 = 5,150 parameters.
I came across your approach just recently and spent the last few days extending your base model to Py-BoostLSS: An extension of Py-Boost to probabilistic modelling. Because it is very runtime efficient, SketchBoost is a good candidate for estimating high-dimensional target variables. The package is in a very early stage and I need to evaluate the runtime efficiency against XGBoostLSS.
@btbpanda I was wondering if you would be interested in a scientific collaboration to further extend the functionality of Py-BoostLSS. Looking forward to your reply.