Skip to content

Is it possible to access the input column names in onnx converter functions? #1088

@paranjapeved15

Description

@paranjapeved15

In the below code, I am appending the vehicleType to the prefix to get which column name to use from my input Dataframe. So for example if vehicleType = 'car' then I would return 'features_car' feature value.
Data:
df = pandas.DataFrame([["car",0.1,0.0],["car",0.2,0.0],["suv",0.0,0.2]],columns=['vehicleType','features_car','features_suv'])
Custom Transformer:

class GetScore(BaseEstimator, TransformerMixin):  # type: ignore
    """Apply binarize transform for matching values to filter_value."""

    def __init__(self, prefix: str):
        """Initialize transformer with expected columns."""
        self.prefix = prefix
        pass

    def dot_product(self, x) -> float:
        """Return 1.0 if input == filter_value, else 0."""
        print("type of x:")
        print(type(x))
        return x[self.prefix+x.vehicleType]


    def fit(self, X, y=None):  # type: ignore
        """Fit the transformer."""
        return self

    def transform(self, X: pandas.DataFrame | numpy.ndarray, y: None = None) -> numpy.ndarray:
        """Transform the given data."""
        if type(X) == pandas.DataFrame:
            x = X.apply(lambda x: self.dot_product(x), axis=1)
            return x.values.reshape((-1, 1))
        # elif type(X) == numpy.ndarray:
        #     vector_func = numpy.vectorize(self.dot_product)
        #     x = vector_func(X)
        #     return x.reshape((-1, 1))

    def get_feature_names_out(self) -> None:
        """Return feature names. Required for onnx conversion."""
        pass

sklearn pipeline:

preprocessor = ColumnTransformer(
        transformers=[
            #("",make_pipeline(OneHotEncoder(categories=[["car", "suv"]], sparse_output=False)), ['vehicleType','features_car','features_suv']),
            ("features_computed",GetScore("features_"), ['vehicleType','features_car','features_suv']),
            ],
    #remainder="passthrough",
    verbose_feature_names_out=False,
)

To write a custom converter for my GetScore, I would need to be able to access the input by the column name. Is that accessible in the converter inputs? Or would I have to come up with another approach?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions