-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
Hi James, this might be related to #150. I would like to use GridSearchCV in combination with GRCCA but I cannot find a way to pass the feature groups over to the .fit() method of GRCCA.
Currently I am getting:
ValueError:
All the 40 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
40 fits failed with the following error:
Traceback (most recent call last):
File "/zi/home/johannes.wiesner/micromamba/envs/csp_wiesner_johannes/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 886, in _fit_and_score
estimator.fit(X_train, **fit_params)
File "/zi/home/johannes.wiesner/micromamba/envs/csp_wiesner_johannes/lib/python3.9/site-packages/sklearn/base.py", line 1473, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/zi/home/johannes.wiesner/micromamba/envs/csp_wiesner_johannes/lib/python3.9/site-packages/sklearn/pipeline.py", line 468, in fit
routed_params = self._check_method_params(method="fit", props=params)
File "/zi/home/johannes.wiesner/micromamba/envs/csp_wiesner_johannes/lib/python3.9/site-packages/sklearn/pipeline.py", line 374, in _check_method_params
fit_params_steps[step]["fit"][param] = pval
File "/zi/home/johannes.wiesner/micromamba/envs/csp_wiesner_johannes/lib/python3.9/site-packages/sklearn/utils/_bunch.py", line 39, in __getitem__
return super().__getitem__(key)
KeyError: 'grcca'Here's some example code:
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold
from cca_zoo.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from cca_zoo.preprocessing import MultiViewPreprocessing
from sklearn.preprocessing import StandardScaler
from cca_zoo.linear import GRCCA
###############################################################################
## Simulate Data: Not part of the question
###############################################################################
# Random state for reproducibility
rng = np.random.RandomState(42)
# Parameters
n_samples = 100
n_features_X = 100
n_features_Y = 10
latent_correlation = 0.6
# Generate a latent variable
latent_dim = 1
latent_variable = rng.randn(n_samples, latent_dim)
# Generate X with structured covariance
# Define groups
group_sizes = [50, 25, 25]
group_correlations = [0.8, 0.7, 0.6]
X = np.zeros((n_samples, n_features_X))
current_feature = 0
for group_size, group_corr in zip(group_sizes, group_correlations):
# Generate a group latent variable
group_latent = latent_variable + rng.randn(n_samples, 1) * (1 - group_corr)
# Generate group features
group_features = group_latent @ rng.randn(1, group_size) + rng.randn(n_samples, group_size) * (1 - group_corr)
X[:, current_feature:current_feature + group_size] = group_features
current_feature += group_size
# Generate Y based on the latent variable
Y = latent_variable @ rng.randn(1, n_features_Y) + rng.randn(n_samples, n_features_Y) * (1 - latent_correlation)
###############################################################################
## Bring data in nice format: Not part of the question
###############################################################################
subject_ids = [f"subject_{i+1}" for i in range(n_samples)]
# get df_brain
df_brain = pd.DataFrame(X)
df_brain.index = subject_ids
df_brain.index.name = 'subject_id'
X_columns = pd.MultiIndex.from_arrays(
[
[f"area_{i+1}" for i in range(100)], # area_label_idx
["network_1"] * 50 + ["network_2"] * 25 + ["network_3"] * 25 # brain_network_idx
],
names=["brain_area","brain_network"]
)
df_brain.columns = X_columns
# get df_behavior
df_behavior = pd.DataFrame(Y)
df_behavior.index = subject_ids
df_behavior.index.name = 'subject_id'
df_behavior.columns = [f"behavioral_variable_{idx+1}" for idx in range(len(df_behavior.columns))]
###############################################################################
## Prepare Analysis: Somehow part of the question?
###############################################################################
# get feature groups: features in df_brain belong to 3 groups, features in df_behavior don't
# have any groups so we set the same number for all features (all features belong to one group)
groups_brain = df_brain.columns.get_level_values('brain_network').astype('category').codes.astype('int64')
groups_behavior = np.array([0 for f in range(len(df_behavior.columns))])
feature_groups = [groups_brain,groups_behavior]
# define latent dimensions
latent_dimensions = 1
# define folds
cv = KFold(5)
# just get numpy arrays
X1 = df_brain.values
X2 = df_behavior.values
###############################################################################
## Actual Question: Run GridSearch with Pipeline that includes Standardization
## and GRCCA
###############################################################################
# define an estimator
estimator = Pipeline([
('preprocessing', MultiViewPreprocessing((StandardScaler(),StandardScaler()))),
('grcca',GRCCA(latent_dimensions=latent_dimensions,random_state=rng))
])
# define grid
param_grid = {'grcca__c':[[10**x for x in range(-1,1)],[10**x for x in range(-1,1)]],
'grcca__mu':[[10**x for x in range(-1,1)],[0]]}
# run gridsearch
grid = GridSearchCV(estimator,param_grid,cv=cv)
grid.fit([X1,X2],grcca__feature_groups=feature_groups)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels