-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Open
Labels
Description
Description
The following snippet is a minimal example demonstrating the issue:
import numpy as np
import pandas as pd
from xgboost import DMatrix
df_numeric = pd.DataFrame(np.random.randn(10, 2))
df_categorical = pd.DataFrame(np.random.randint(0, 2, (10, 2))).astype('category')
df = pd.concat(
[df_numeric, df_categorical],
axis=1,
# ignore_index=True # <-- Uncomment to fix the issue
)
DMatrix(df, enable_categorical=True)
The above code triggers the following exception:
...
AttributeError: 'DataFrame' object has no attribute 'dtype'
The above error message does not clearly point to the root cause of the exception. Note that passing ignore_index=True to pd.concat fixes the issue, so it seems that columns with the same names (which happens without passing ignore_index) is the problem. My suggestion is to raise a more user-friendly exception in this scenario.
Software
python==3.12.9
xgboost==3.1.2