Skip to content

Variables not in any trees have positive gain/cover/weight #11879

@jhaneyrf

Description

@jhaneyrf

I was converting an XGBoost classifier model to PMML, and I noticed that about 40 of the columns in the model.feature_names list were not present in the PMML file.

I did a little more research. I usually load my model as a Booster() object, but when I loaded it as XGBClassifier() and looked at the feature_importances_ attribute, I saw that those 40 variables all had zero importance.

So I went back to loading from a Booster() and output get_dump() to a text file. I could not find any reference to those 40 columns in that file.

So then I ran the get_score method on all five importance types. For each importance type, these unused variables had a positive score.

Is this a known issue? Is there some place that documents why this behavior should be expected?

It appears that my PMML file is correct even though it excludes those 40 variables. I'm just wondering why those 40 variables wound up in the feature_names or feature_names_in_ list for this model if they are not used in any of the final trees.

I did all of this in XGBoost 2.1.3.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions