Skip to content

WIP, ENH: Add ML training and inference#28

Open
adamwitmer wants to merge 20 commits intonidhin_data_analysis_backupfrom
nidhin_train_inference_rebase
Open

WIP, ENH: Add ML training and inference#28
adamwitmer wants to merge 20 commits intonidhin_data_analysis_backupfrom
nidhin_train_inference_rebase

Conversation

@adamwitmer
Copy link
Copy Markdown
Collaborator

This is a work in progress branch of PR #5 that has been rebased against PR #4 to show only the relevant changes to this branch (as opposed to all of the changes from all previous PRs) and is intended to be used for addressing review comments (both from PR #4 and any new comments made here).

Nidhin Thomas and others added 6 commits March 17, 2026 09:07
Added neat_ml/model for ML
Added neat_ml/utils for plotting
Added neat_ml/phase_diagram for plotting phase diagram
Added test scripts and baseline images
Modified run_workflow.py for training, inference and
feature importance
Updated the test_workflow.py to incorporate new tests
Updated README.md with commandline example with train,
infer,plot.

Removed extraneous code from test scripts
Modified baseline images to ensure the tolerance
to be within 1e-4
Added ci.yml
Added LANL copyright assertion ID
* add test file assets
* fix mypy error
@adamwitmer adamwitmer changed the title WIP: Nidhin train inference rebase WIP, ENH: Add ML training and inference Mar 17, 2026
@tylerjereddy
Copy link
Copy Markdown
Collaborator

@adamwitmer let me know when this is ready for review -- I believe it was to be presented today (March 20th) after being delayed from 4 months ago. I gave you a detailed review of gh-4 after you at least did a few things for the ASC polymer project (it does still seem a bit sluggish over there).

I'll review this roughly 3 days after it is presented for review, assuming that duties on the ASC project are kept up.

@adamwitmer
Copy link
Copy Markdown
Collaborator Author

adamwitmer commented Mar 22, 2026

Initial TODO items for reviewing this branch:

  • read/understand diff/PR (read line-by-line, probing for weaknesses)
  • check for unnecessary complexity; areas for improvement/simplification
  • rebase branch against main (and push backup branch)
  • fix issues with git lfs (i.e. image storage with zenodo https://github.com/lanl/ldrd_neat_ml_images)
  • fix github CI
  • run test-suite and check test coverage
  • run branch according to README.md instructions
  • copy remaining review comments from gitlab
  • perform detailed code review
  • address all review comments
  • triple check diff
    • 1st check
    • 2nd check
    • 3rd check

Comment thread neat_ml/model/inference.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment on lines +67 to +69
plt.gcf().set_size_inches(8, 6)
plt.tight_layout()
plt.savefig(out_dir / "shap_summary.png", dpi=300)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to use fig, ax handles here...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that shap.summary_plot has support for fig, ax handles, per: shap/shap#3411, which can be seen at: https://github.com/shap/shap/blob/93dc2a1e446616fb0858b2ec108f80e4969ba6d9/shap/plots/summary.py#L45. This was an issue in the ldrd_virus_work project, which still uses global plt handles.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm pretty sure I asked for help resolving that upstream for that other project too. In fact, I did here--https://lisdi-git.lanl.gov/treddy/ldrd_virus_work/-/issues/89#note_31565.

And I resolved the matter myself on the rng side at shap/shap#3945. That could have been a good opportunity to help out the community...

Quoting from the internal issue:

My expectation is that the team will clearly communicate which shap issues still remain, and really that you'll help me solve them proactively. That took about a month for a fairly simple patch, so look at your calendar and think about how long larger changes may take.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a new issue to track this bug #29.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't self-resolve comments where I've made a request for a change.

This isn't resolved--you've simply opened an issue to delay doing something I requested in November 2024, so the preferable route forward there is clear.

that you'll help me solve them proactively

Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/train.py Outdated
Comment thread neat_ml/tests/conftest.py Outdated
Comment thread neat_ml/tests/test_workflow.py
Comment thread neat_ml/workflow/lib_workflow.py Outdated
Comment thread neat_ml/workflow/lib_workflow.py Outdated
Comment thread neat_ml/workflow/lib_workflow.py Outdated
Comment thread run_workflow.py Outdated
Comment thread neat_ml/model/feature_importance.py Outdated
Comment thread neat_ml/model/train.py Outdated
Comment thread neat_ml/tests/test_workflow.py Outdated
Comment thread neat_ml/tests/test_workflow.py Outdated
Comment thread neat_ml/workflow/lib_workflow.py Outdated
Comment thread neat_ml/workflow/lib_workflow.py Outdated
Comment thread run_workflow.py Outdated
Comment thread run_workflow.py Outdated
@adamwitmer
Copy link
Copy Markdown
Collaborator Author

@tylerjereddy I have completed my initial checklist and addressed all review comments including initial comments made on PR #5. I made sure the workflow runs on glycan using real data and compared the outputs of running hyperparameter optimization vs. not for opencv and bubblesam (#28 (comment)) in relation to the request at #5 (comment). I re-read the diff several times looking for unnecessary complexity and other points of emphasis from previous PR's. This branch should be ready for an your review, thanks.

@tylerjereddy
Copy link
Copy Markdown
Collaborator

I'll make a note to do a first round of review on Friday, April 3, assuming activity is kept up on the ASC polymer project at 2 days effort/week (or if charging was completely stopped there this week for a new project substitution). Otherwise, I'll wait for you to catch up over there.

As discussed in person, presenting this volume of work this close to the deadline does place a lot of review burden on the team that is best avoided by progressively presenting the work over months of more digestible back and forth.

@tylerjereddy tylerjereddy added the enhancement New feature or request label Apr 3, 2026
Copy link
Copy Markdown
Collaborator

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to provide a detailed review here despite the extraordinary (months) delay in presenting for review.

The value of hyperopt seems to be basically 0. I also commented in my review that you aren't even using hyeropt on one of your two estimators, so this really is a poor effort on the hyperopt side of things. Probably an issue should be opened and the source should point to that issue for future improvements in more complex situations where it actually matters. Grid search probably isn't sustainable in more complex situations where a guided/Bayesian search is likely required.

Test line coverage seems fine. However, when running the testsuite locally on this PR branch I saw several failures--possibly caused by dependency versions. It would be helpful to support a reasonably-wide range of dependency versions and to error out right away at runtime when importing a dependency at a version we do not support, so that what is happening is clear to the user (and the reviewer, who is spending time trying to figure this out...). Testsuite error output is below the fold--I'll leave resolution of that to you, and will probably just blindly change dependency versions locally until things work.

Details
============================================================================================ FAILURES ============================================================================================
___________________________________________________________________________________ test_train_with_validation ___________________________________________________________________________________
[gw1] darwin -- Python 3.12.3 /Users/treddy/python_venvs/py_312_ldrd_neat_dev/bin/python

sample_data =     feature1  feature2 feature3  exclude_col  target
0   0.773956  9.085807        A            0     1.0
1   0.438878...06  1.964347        B           98     1.0
99  0.961898  3.103237        B           99     0.0

[100 rows x 5 columns]

    def test_train_with_validation(sample_data: pd.DataFrame):
        X, y = preprocess(sample_data, target="target")
        # perfectly align all the feature data with the target
        X['feature1'] = np.where(
            y == 1.0,
            np.random.uniform(0.6, 1.0, len(X)),
            np.random.uniform(0.0, 0.4, len(X))
        )
        X['feature2'] = np.where(
            y == 1.0,
            np.random.uniform(6, 10, len(X)),
            np.random.uniform(0, 4, len(X))
        )
        X_train, y_train = X.iloc[:80], y.iloc[:80]
        X_val, y_val = X.iloc[80:], y.iloc[80:]
    
>       model, metrics, _, actual_val_proba = train_with_validation(
            X_train, y_train, X_val, y_val, n_jobs=1, ml_hyper_opt=False,
        )

neat_ml/tests/test_train.py:74: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
neat_ml/model/train.py:229: in train_with_validation
    final_model = pipeline.fit(X_train, y_train)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py:1336: in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/pipeline.py:621: in fit
    self._final_estimator.fit(Xt, y, **last_step_params["fit"])
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py:1336: in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_voting.py:405: in fit
    return super().fit(X, transformed_y, **fit_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_voting.py:80: in fit
    names, clfs = self._validate_estimators()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = VotingClassifier(estimators=[('rf',
                              RandomForestClassifier(class_weight='balanced',
    ...ee=None,
                                            random_state=42, ...))],
                 n_jobs=1, voting='soft')

    def _validate_estimators(self):
        if len(self.estimators) == 0 or not all(
            isinstance(item, (tuple, list)) and isinstance(item[0], str)
            for item in self.estimators
        ):
            raise ValueError(
                "Invalid 'estimators' attribute, 'estimators' should be a "
                "non-empty list of (string, estimator) tuples."
            )
        names, estimators = zip(*self.estimators)
        # defined by MetaEstimatorMixin
        self._validate_names(names)
    
        has_estimator = any(est != "drop" for est in estimators)
        if not has_estimator:
            raise ValueError(
                "All estimators are dropped. At least one is required "
                "to be an estimator."
            )
    
        is_estimator_type = is_classifier if is_classifier(self) else is_regressor
    
        for est in estimators:
            if est != "drop" and not is_estimator_type(est):
>               raise ValueError(
                    "The estimator {} should be a {}.".format(
                        est.__class__.__name__, is_estimator_type.__name__[3:]
                    )
                )
E               ValueError: The estimator XGBClassifier should be a classifier.

../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_base.py:243: ValueError
_____________________________________________________________________________________ test_save_model_bundle _____________________________________________________________________________________
[gw1] darwin -- Python 3.12.3 /Users/treddy/python_venvs/py_312_ldrd_neat_dev/bin/python

tmp_path = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw1/test_save_model_bundle0')
sample_data =     feature1  feature2 feature3  exclude_col  target
0   0.773956  9.085807        A            0     1.0
1   0.438878...06  1.964347        B           98     1.0
99  0.961898  3.103237        B           99     0.0

[100 rows x 5 columns]

    def test_save_model_bundle(tmp_path: Path, sample_data: pd.DataFrame):
        X, y = preprocess(sample_data, target="target")
        X_train, y_train = X.iloc[:80], y.iloc[:80]
        X_val, y_val = X.iloc[80:], y.iloc[80:]
>       expected_model, expected_metrics, expected_params, _ = train_with_validation(
            X_train, y_train, X_val, y_val, n_jobs=1, ml_hyper_opt=False
        )

neat_ml/tests/test_train.py:100: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
neat_ml/model/train.py:229: in train_with_validation
    final_model = pipeline.fit(X_train, y_train)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py:1336: in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/pipeline.py:621: in fit
    self._final_estimator.fit(Xt, y, **last_step_params["fit"])
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py:1336: in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_voting.py:405: in fit
    return super().fit(X, transformed_y, **fit_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_voting.py:80: in fit
    names, clfs = self._validate_estimators()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = VotingClassifier(estimators=[('rf',
                              RandomForestClassifier(class_weight='balanced',
    ...ee=None,
                                            random_state=42, ...))],
                 n_jobs=1, voting='soft')

    def _validate_estimators(self):
        if len(self.estimators) == 0 or not all(
            isinstance(item, (tuple, list)) and isinstance(item[0], str)
            for item in self.estimators
        ):
            raise ValueError(
                "Invalid 'estimators' attribute, 'estimators' should be a "
                "non-empty list of (string, estimator) tuples."
            )
        names, estimators = zip(*self.estimators)
        # defined by MetaEstimatorMixin
        self._validate_names(names)
    
        has_estimator = any(est != "drop" for est in estimators)
        if not has_estimator:
            raise ValueError(
                "All estimators are dropped. At least one is required "
                "to be an estimator."
            )
    
        is_estimator_type = is_classifier if is_classifier(self) else is_regressor
    
        for est in estimators:
            if est != "drop" and not is_estimator_type(est):
>               raise ValueError(
                    "The estimator {} should be a {}.".format(
                        est.__class__.__name__, is_estimator_type.__name__[3:]
                    )
                )
E               ValueError: The estimator XGBClassifier should be a classifier.

../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_base.py:243: ValueError
________________________________________________________________________________ test_compare_methods_end_to_end _________________________________________________________________________________
[gw2] darwin -- Python 3.12.3 /Users/treddy/python_venvs/py_312_ldrd_neat_dev/bin/python

tmp_path = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw2/test_compare_methods_end_to_en0')
classification_dataset = (   PEO 10 kg/mol (wt%)  ...  graph_num_components
0             2.334594  ...              2.416491
1            -2.1...18

[10 rows x 5 columns], 0    0
1    1
2    0
3    0
4    1
5    0
6    1
7    0
8    1
9    1
Name: y, dtype: int64)
stable_rc = {'axes.labelsize': 10, 'axes.linewidth': 1.0, 'axes.titlesize': 12, 'figure.dpi': 100, ...}, baseline_dir = PosixPath('/Users/treddy/LANL/gitlab/ldrd_neat_ml/neat_ml/tests/baseline')

    def test_compare_methods_end_to_end(
        tmp_path: Path,
        classification_dataset: tuple[pd.DataFrame, pd.Series],
        stable_rc,
        baseline_dir,
    ):
        """
        End-to-end test of compare_methods.
        Test consistency of mean rank of important features
        PNG compared via inline NumPy RMS diff.
        """
        rng = np.random.default_rng(0)
        X, y = classification_dataset
        # "preprocess" dataset to remove composition columns
        X = X.drop(columns=["PEO 10 kg/mol (wt%)", "Dextran 10 kg/mol (wt%)"])
        model = RandomForestClassifier(random_state=0).fit(X, y)
    
        with mpl.rc_context(stable_rc):
>           fi.compare_methods(model, X, y, out_dir=tmp_path, top=3, rng=rng)

neat_ml/tests/test_feature_importance.py:154: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
neat_ml/model/feature_importance.py:335: in compare_methods
    shap_imp = _run_shap(model, X, out_dir, top=top, rng=rng)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model = RandomForestClassifier(random_state=0)
X =    num_blobs  coverage_percentage  graph_num_components
0  -1.698423             2.336225              2.416491
1   1.....170261
8   2.200476            -2.275953              0.915428
9  -0.377733            -1.104456             -1.824018
out_dir = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw2/test_compare_methods_end_to_en0'), top = 3, n_jobs = -1
rng = Generator(PCG64) at 0x3441CFCA0

    def _run_shap(
        model, X: pd.DataFrame,
        out_dir: Path,
        top: int = 20,
        n_jobs: int = -1,
        rng: np.random.Generator | None = None,
    ) -> pd.Series:
        """
        Compute global SHAP values for *model* and derive per-feature importance.
    
        A permutation explainer is instantiated on the fly because it works with
        any black box predict_proba** function.  The absolute SHAP values are
        averaged across all rows, giving a single scalar importance per feature.
    
        Parameters
        ----------
        model : Any
            Fitted classifier exposing a predict_proba(X) -> ndarray method whose
            second dimension contains probabilities for the positive class.
        X : pandas.DataFrame
            Numeric feature matrix used both as background data for the explainer
            and as the evaluation set whose SHAP values are summarized.
        out_dir : pathlib.Path
            Directory where the SHAP bar chart (shap_summary.png) will be saved.
        top : int, default 20
            Maximum number of features to display in the SHAP summary figure.
        n_jobs : int
            number of parallel processes to run for shap explainer. n_jobs=-1 uses
            all cores.
        rng : np.random.Generator | None
            pseudorandom number generator
    
        Returns
        -------
        imp : pandas.Series
            Index = feature names, values = mean absolute SHAP value (descending).
        """
        explainer = shap.Explainer(
            model.predict_proba,
            masker=X.values,
            algorithm="permutation",
            n_jobs=n_jobs,
            feature_names=X.columns.to_list(),
        )
        vals = explainer(X.values).values
        vals = vals[:, :, 1] if vals.ndim == 3 else vals
        imp = pd.Series(np.abs(vals).mean(0), index=X.columns).sort_values(ascending=False)
    
>       shap.summary_plot(vals, features=X, max_display=top, show=False, rng=rng)
E       TypeError: summary_legacy() got an unexpected keyword argument 'rng'

neat_ml/model/feature_importance.py:74: TypeError
_____________________________________________________________________________ test_stage_train_model_column_mismatch _____________________________________________________________________________
[gw2] darwin -- Python 3.12.3 /Users/treddy/python_venvs/py_312_ldrd_neat_dev/bin/python

tmp_path = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw2/test_stage_train_model_column_0')
sample_data =     feature1  feature2 feature3  exclude_col  target
0   0.773956  9.085807        A            0     1.0
1   0.438878...06  1.964347        B           98     1.0
99  0.961898  3.103237        B           99     0.0

[100 rows x 5 columns]
caplog = <_pytest.logging.LogCaptureFixture object at 0x39ea10530>

    def test_stage_train_model_column_mismatch(
        tmp_path: Path, sample_data, caplog
    ):
        caplog.set_level(logging.WARNING)
        train_ds = {"id": "TR4"}
        train_path = tmp_path / "train.csv"
        val_path = tmp_path / "val.csv"
        train_paths = {"agg_csv": train_path, "model_dir": tmp_path / "model"}
        val_paths = {"agg_csv": val_path}
        sample_data.to_csv(train_path, index=False)
        val_data = sample_data.drop(columns=["feature1", "exclude_col"])
        val_data.to_csv(val_path, index=False)
    
>       wf.stage_train_model(
            train_ds,
            train_paths,
            val_ds={"id": "VAL"},
            val_paths=val_paths,
            target="target"
        )

neat_ml/tests/test_workflow.py:726: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
neat_ml/workflow/lib_workflow.py:407: in stage_train_model
    model, metrics, best_params, val_proba = train_with_validation(
neat_ml/model/train.py:224: in train_with_validation
    grid_search.fit(X, y)
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py:1336: in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_search.py:1053: in fit
    self._run_search(evaluate_candidates)
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_search.py:1612: in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_search.py:1030: in evaluate_candidates
    _warn_or_raise_about_fit_failures(out, self.error_score)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

results = [{'fit_error': 'Traceback (most recent call last):\n  File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python...fier should be a classifier.\n', 'fit_time': 0.0013880729675292969, 'n_test_samples': 99, 'score_time': 0.0, ...}, ...]
error_score = nan

    def _warn_or_raise_about_fit_failures(results, error_score):
        fit_errors = [
            result["fit_error"] for result in results if result["fit_error"] is not None
        ]
        if fit_errors:
            num_failed_fits = len(fit_errors)
            num_fits = len(results)
            fit_errors_counter = Counter(fit_errors)
            delimiter = "-" * 80 + "\n"
            fit_errors_summary = "\n".join(
                f"{delimiter}{n} fits failed with the following error:\n{error}"
                for error, n in fit_errors_counter.items()
            )
    
            if num_failed_fits == num_fits:
                all_fits_failed_message = (
                    f"\nAll the {num_fits} fits failed.\n"
                    "It is very likely that your model is misconfigured.\n"
                    "You can try to debug the error by setting error_score='raise'.\n\n"
                    f"Below are more details about the failures:\n{fit_errors_summary}"
                )
>               raise ValueError(all_fits_failed_message)
E               ValueError: 
E               All the 72 fits failed.
E               It is very likely that your model is misconfigured.
E               You can try to debug the error by setting error_score='raise'.
E               
E               Below are more details about the failures:
E               --------------------------------------------------------------------------------
E               72 fits failed with the following error:
E               Traceback (most recent call last):
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_validation.py", line 833, in _fit_and_score
E                   estimator.fit(X_train, y_train, **fit_params)
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py", line 1336, in wrapper
E                   return fit_method(estimator, *args, **kwargs)
E                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/pipeline.py", line 621, in fit
E                   self._final_estimator.fit(Xt, y, **last_step_params["fit"])
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py", line 1336, in wrapper
E                   return fit_method(estimator, *args, **kwargs)
E                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_voting.py", line 405, in fit
E                   return super().fit(X, transformed_y, **fit_params)
E                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_voting.py", line 80, in fit
E                   names, clfs = self._validate_estimators()
E                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_base.py", line 243, in _validate_estimators
E                   raise ValueError(
E               ValueError: The estimator XGBClassifier should be a classifier.

../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_validation.py:479: ValueError
--------------------------------------------------------------------------------------- Captured log call ----------------------------------------------------------------------------------------
WARNING  neat_ml.workflow.lib_workflow:lib_workflow.py:396 Feature mismatch: using 1common features (train=Index(['feature1', 'feature2', 'exclude_col'], dtype='object'), val=Index(['feature2'], dtype='object')).
_____________________________________________________________________ test_stage_train_model_happy_path_saves_bundle_and_roc _____________________________________________________________________
[gw2] darwin -- Python 3.12.3 /Users/treddy/python_venvs/py_312_ldrd_neat_dev/bin/python

tmp_path = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw2/test_stage_train_model_happy_p0')
sample_data =     feature1  feature2 feature3  exclude_col  target
0   0.773956  9.085807        A            0     1.0
1   0.438878...06  1.964347        B           98     1.0
99  0.961898  3.103237        B           99     0.0

[100 rows x 5 columns]

    def test_stage_train_model_happy_path_saves_bundle_and_roc(
        tmp_path: Path,
        sample_data,
    ):
    
        train_ds = {"id": "TR5"}
        train_paths = {"agg_csv": tmp_path / "train.csv", "model_dir": tmp_path / "model"}
        val_paths = {"agg_csv": tmp_path / "val.csv"}
        sample_data.to_csv(val_paths["agg_csv"], index=False)
        sample_data.to_csv(train_paths["agg_csv"], index=False)
    
>       wf.stage_train_model(
            train_ds,
            train_paths,
            val_ds={"id": "VAL"},
            val_paths=val_paths,
            target="target"
        )

neat_ml/tests/test_workflow.py:747: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
neat_ml/workflow/lib_workflow.py:407: in stage_train_model
    model, metrics, best_params, val_proba = train_with_validation(
neat_ml/model/train.py:224: in train_with_validation
    grid_search.fit(X, y)
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py:1336: in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_search.py:1053: in fit
    self._run_search(evaluate_candidates)
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_search.py:1612: in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_search.py:1030: in evaluate_candidates
    _warn_or_raise_about_fit_failures(out, self.error_score)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

results = [{'fit_error': 'Traceback (most recent call last):\n  File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python...assifier should be a classifier.\n', 'fit_time': 0.000946044921875, 'n_test_samples': 99, 'score_time': 0.0, ...}, ...]
error_score = nan

    def _warn_or_raise_about_fit_failures(results, error_score):
        fit_errors = [
            result["fit_error"] for result in results if result["fit_error"] is not None
        ]
        if fit_errors:
            num_failed_fits = len(fit_errors)
            num_fits = len(results)
            fit_errors_counter = Counter(fit_errors)
            delimiter = "-" * 80 + "\n"
            fit_errors_summary = "\n".join(
                f"{delimiter}{n} fits failed with the following error:\n{error}"
                for error, n in fit_errors_counter.items()
            )
    
            if num_failed_fits == num_fits:
                all_fits_failed_message = (
                    f"\nAll the {num_fits} fits failed.\n"
                    "It is very likely that your model is misconfigured.\n"
                    "You can try to debug the error by setting error_score='raise'.\n\n"
                    f"Below are more details about the failures:\n{fit_errors_summary}"
                )
>               raise ValueError(all_fits_failed_message)
E               ValueError: 
E               All the 72 fits failed.
E               It is very likely that your model is misconfigured.
E               You can try to debug the error by setting error_score='raise'.
E               
E               Below are more details about the failures:
E               --------------------------------------------------------------------------------
E               72 fits failed with the following error:
E               Traceback (most recent call last):
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_validation.py", line 833, in _fit_and_score
E                   estimator.fit(X_train, y_train, **fit_params)
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py", line 1336, in wrapper
E                   return fit_method(estimator, *args, **kwargs)
E                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/pipeline.py", line 621, in fit
E                   self._final_estimator.fit(Xt, y, **last_step_params["fit"])
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py", line 1336, in wrapper
E                   return fit_method(estimator, *args, **kwargs)
E                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_voting.py", line 405, in fit
E                   return super().fit(X, transformed_y, **fit_params)
E                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_voting.py", line 80, in fit
E                   names, clfs = self._validate_estimators()
E                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
E                 File "/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/ensemble/_base.py", line 243, in _validate_estimators
E                   raise ValueError(
E               ValueError: The estimator XGBClassifier should be a classifier.

../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/model_selection/_validation.py:479: ValueError
__________________________________________________________________ test_stage_explain_aligns_features_and_calls_compare_methods __________________________________________________________________
[gw2] darwin -- Python 3.12.3 /Users/treddy/python_venvs/py_312_ldrd_neat_dev/bin/python

tmp_path = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw2/test_stage_explain_aligns_feat0')
sample_inference_data = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw2/infer0/inference_data.csv')
trained_model_bundle = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw2/model0/model.joblib')

    def test_stage_explain_aligns_features_and_calls_compare_methods(
        tmp_path: Path,
        sample_inference_data,
        trained_model_bundle,
    ):
        explain_out = tmp_path / "explain_out"
        train_ds = {"id": "TRX", "composition_cols": ["PEG"]}
        paths = {"agg_csv": sample_inference_data, "explain_dir": explain_out}
    
>       wf.stage_explain(train_ds, paths, trained_model_bundle, target="ground_truth")

neat_ml/tests/test_workflow.py:766: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
neat_ml/workflow/lib_workflow.py:479: in stage_explain
    compare_methods(
neat_ml/model/feature_importance.py:335: in compare_methods
    shap_imp = _run_shap(model, X, out_dir, top=top, rng=rng)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model = Pipeline(steps=[('impute', SimpleImputer(strategy='median')),
                ('scale', StandardScaler()),
                ('clf', LogisticRegression(random_state=42))])
X =       feat_a  feat_b
0   0.682352     0.0
1   0.053821     1.0
2   0.220360     2.0
3   0.184372     3.0
4   0.175906 ...173632    44.0
45  0.312742    45.0
46  0.014474    46.0
47  0.032552    47.0
48  0.496702    48.0
49  0.468313    49.0
out_dir = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-13/popen-gw2/test_stage_explain_aligns_feat0/explain_out'), top = 20, n_jobs = -1
rng = None

    def _run_shap(
        model, X: pd.DataFrame,
        out_dir: Path,
        top: int = 20,
        n_jobs: int = -1,
        rng: np.random.Generator | None = None,
    ) -> pd.Series:
        """
        Compute global SHAP values for *model* and derive per-feature importance.
    
        A permutation explainer is instantiated on the fly because it works with
        any black box predict_proba** function.  The absolute SHAP values are
        averaged across all rows, giving a single scalar importance per feature.
    
        Parameters
        ----------
        model : Any
            Fitted classifier exposing a predict_proba(X) -> ndarray method whose
            second dimension contains probabilities for the positive class.
        X : pandas.DataFrame
            Numeric feature matrix used both as background data for the explainer
            and as the evaluation set whose SHAP values are summarized.
        out_dir : pathlib.Path
            Directory where the SHAP bar chart (shap_summary.png) will be saved.
        top : int, default 20
            Maximum number of features to display in the SHAP summary figure.
        n_jobs : int
            number of parallel processes to run for shap explainer. n_jobs=-1 uses
            all cores.
        rng : np.random.Generator | None
            pseudorandom number generator
    
        Returns
        -------
        imp : pandas.Series
            Index = feature names, values = mean absolute SHAP value (descending).
        """
        explainer = shap.Explainer(
            model.predict_proba,
            masker=X.values,
            algorithm="permutation",
            n_jobs=n_jobs,
            feature_names=X.columns.to_list(),
        )
        vals = explainer(X.values).values
        vals = vals[:, :, 1] if vals.ndim == 3 else vals
        imp = pd.Series(np.abs(vals).mean(0), index=X.columns).sort_values(ascending=False)
    
>       shap.summary_plot(vals, features=X, max_display=top, show=False, rng=rng)
E       TypeError: summary_legacy() got an unexpected keyword argument 'rng'

neat_ml/model/feature_importance.py:74: TypeError
======================================================================================== warnings summary ========================================================================================
../../../python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/shap/plots/colors/_colorconv.py:819: 7272 warnings
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/shap/plots/colors/_colorconv.py:819: DeprecationWarning: Converting `np.inexact` or `np.floating` to a dtype is deprecated. The current result is `float64` which is not strictly correct.
    if np.issubdtype(dtype_in, np.dtype(dtype).type):

neat_ml/tests/test_analysis.py: 30 warnings
  /Users/treddy/LANL/gitlab/ldrd_neat_ml/neat_ml/analysis/data_analysis.py:410: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
    nbr_dists = np.fromiter((d["distance"] for _, _, d in

neat_ml/tests/test_detection.py::test_detect_single_image_no_blobs
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/numpy/lib/_nanfunctions_impl.py:1231: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)

neat_ml/tests/test_lib.py: 15 warnings
neat_ml/tests/test_feature_importance.py: 719 warnings
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/utils/validation.py:2691: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
    warnings.warn(

neat_ml/tests/test_bubblesam.py: 89 warnings
neat_ml/tests/test_workflow.py: 59 warnings
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/torch/jit/_script.py:1480: DeprecationWarning: `torch.jit.script` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

neat_ml/tests/test_detection.py::test_detect_single_image_processed
neat_ml/tests/test_workflow.py::test_run_workflow_single_image_path[opencv-None-paths2-bubble_data]
neat_ml/tests/test_detection.py::test_visual_regression_debug_overlay
neat_ml/tests/test_workflow.py::test_stage_detect_pipeline_runs[ds1-paths1-exp_columns1]
  /Users/treddy/LANL/gitlab/ldrd_neat_ml/neat_ml/opencv/detection.py:70: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
    columns=["bubble_number", "center", "radius", "area", "bbox"]).fillna(np.nan)

neat_ml/tests/test_workflow.py: 5049 warnings
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/utils/validation.py:2691: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
    warnings.warn(

neat_ml/tests/test_bubblesam.py::test_sam_internal_api[mps]
neat_ml/tests/test_bubblesam.py::test_bubblesam_detection_generates_pngs[cpu]
neat_ml/tests/test_workflow.py::test_run_workflow_single_image_path[bubblesam-cpu-paths0-masks_filtered]
neat_ml/tests/test_workflow.py::test_stage_detect_pipeline_runs[ds0-paths0-exp_columns0]
neat_ml/tests/test_bubblesam.py::test_run_bubblesam[cpu]
neat_ml/tests/test_bubblesam.py::test_bubblesam_detection_generates_pngs[mps]
neat_ml/tests/test_bubblesam.py::test_sam_internal_api[cpu]
neat_ml/tests/test_workflow.py::test_run_workflow_single_image_path[bubblesam-gpu-paths1-masks_filtered]
neat_ml/tests/test_bubblesam.py::test_run_bubblesam[gpu]
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sam2/sam2_image_predictor.py:431: UserWarning: cannot import name '_C' from 'sam2' (/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sam2/__init__.py)
  
  Skipping the post-processing step due to the error above. You can still use SAM 2 and it's OK to ignore the error above, although some post-processing functionality may be limited (which doesn't affect the results in most cases; see https://github.com/facebookresearch/sam2/blob/main/INSTALL.md).
    masks = self._transforms.postprocess_masks(

neat_ml/tests/test_workflow.py::test_stage_run_inference_calls_inference_and_makes_pred_dir
neat_ml/tests/test_workflow.py::test_stage_run_inference_calls_inference_and_makes_pred_dir
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py:1336: ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (2). Possibly due to duplicate points in X.
    return fit_method(estimator, *args, **kwargs)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================================== short test summary info =====================================================================================
FAILED neat_ml/tests/test_train.py::test_train_with_validation - ValueError: The estimator XGBClassifier should be a classifier.
FAILED neat_ml/tests/test_train.py::test_save_model_bundle - ValueError: The estimator XGBClassifier should be a classifier.
FAILED neat_ml/tests/test_feature_importance.py::test_compare_methods_end_to_end - TypeError: summary_legacy() got an unexpected keyword argument 'rng'
FAILED neat_ml/tests/test_workflow.py::test_stage_train_model_column_mismatch - ValueError: 
FAILED neat_ml/tests/test_workflow.py::test_stage_train_model_happy_path_saves_bundle_and_roc - ValueError: 
FAILED neat_ml/tests/test_workflow.py::test_stage_explain_aligns_features_and_calls_compare_methods - TypeError: summary_legacy() got an unexpected keyword argument 'rng'
=================================================================== 6 failed, 166 passed, 2 skipped, 13249 warnings in 56.22s ====================================================================

Comment thread .github/workflows/ci.yml
on:
push:
branches: [ main ]
branches: [ main, nidhin_data_analysis_backup ]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you only need the temporary pull_request modification below, since we don't plan to merge into the non-main branch here.

Comment thread .github/workflows/ci.yml
if: runner.os == 'macOS'
run: |
echo "Limiting OpenMP to 1 thread for macOS performance"
echo "OMP_NUM_THREADS=1" >> $GITHUB_ENV
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not sufficient detail here to motivate the need for this shim, and you should try to avoid burdening the reviewer with having to go fishing for the related information.

Even if you explained this somewhere else, the most helpful course of action tends to be to help the reader out with a clear and concise comment in your CI configuration that explains exactly why this is needed--which library is affected? Is it an upstream bug?

Why are we doing this instead of using a canonical Python-level tool like threadpoolctl, which helps limit the number of threads used in native libraries that handle their own internal threadpool (BLAS and OpenMP implementations).

I'm not necessarily saying you're wrong here, but you're asking the reviewer to do some heavy lifting to figure out what is going on, which isn't great for clarity/efficiency of reviewer time.

ebm_act = tmp_path / "ebm_importance.png"
ebm_exp = baseline_dir / "ebm_importance_expected.png"
result = compare_images(ebm_exp, ebm_act, tol=1e-4) # type: ignore[call-overload]
assert result is None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing pretty consistently for me locally on ARM Mac with traceback below the fold. Tests should be contructed to be reliable--if dependency versions cause issues that should be cleaned up somehow (shim in the source code or error out with unsupported version of dep). If something is missing a random seed it should be pinned, etc.

Details
_______________________________________________________________________________ test_compare_methods_end_to_end _________________________________________________________________________________
[gw2] darwin -- Python 3.12.3 /Users/treddy/python_venvs/py_312_ldrd_neat_dev/bin/python

tmp_path = PosixPath('/private/var/folders/5_/hm0ft57n6dn2ksgg2p0bx5h0000w2g/T/pytest-of-treddy/pytest-15/popen-gw2/test_compare_methods_end_to_en0')
classification_dataset = (   PEO 10 kg/mol (wt%)  ...  graph_num_components
0             2.334594  ...              2.416491
1            -2.1...18

[10 rows x 5 columns], 0    0
1    1
2    0
3    0
4    1
5    0
6    1
7    0
8    1
9    1
Name: y, dtype: int64)
stable_rc = {'axes.labelsize': 10, 'axes.linewidth': 1.0, 'axes.titlesize': 12, 'figure.dpi': 100, ...}, baseline_dir = PosixPath('/Users/treddy/LANL/gitlab/ldrd_neat_ml/neat_ml/tests/baseline')

    def test_compare_methods_end_to_end(
        tmp_path: Path,
        classification_dataset: tuple[pd.DataFrame, pd.Series],
        stable_rc,
        baseline_dir,
    ):
        """
        End-to-end test of compare_methods.
        Test consistency of mean rank of important features
        PNG compared via inline NumPy RMS diff.
        """
        rng = np.random.default_rng(0)
        X, y = classification_dataset
        # "preprocess" dataset to remove composition columns
        X = X.drop(columns=["PEO 10 kg/mol (wt%)", "Dextran 10 kg/mol (wt%)"])
        model = RandomForestClassifier(random_state=0).fit(X, y)
    
        with mpl.rc_context(stable_rc):
            fi.compare_methods(model, X, y, out_dir=tmp_path, top=3, rng=rng)
    
        actual_csv_path = tmp_path / "feature_importance_comparison.csv"
    
        actual_df = pd.read_csv(actual_csv_path, index_col=0)
        # SHAP importance values fluctuate on the order of 1e-2 floating
        # point precision between calls, so check that the mean ranking of
        # the feature importance values is preserved.
        assert_allclose(actual_df["mean_rank"], [1.3333333333333333, 2.0, 2.6666666666666665])
    
        # check the output of ebm importance ranking.
        # for the same reason that SHAP values are difficult to compare,
        # the SHAP plot and FIC plots also fluctuate between runs,
        # by a floating point value big enough to make image comparison difficult.
        ebm_act = tmp_path / "ebm_importance.png"
        ebm_exp = baseline_dir / "ebm_importance_expected.png"
        result = compare_images(ebm_exp, ebm_act, tol=1e-4) # type: ignore[call-overload]
>       assert result is None
E       AssertionError: assert 'Error: Image files did not match.\n  RMS Value: 9.395480684439018\n  Expected:  \n    /Users/treddy/LANL/gitlab/ldrd_...f-treddy/pytest-15/popen-gw2/test_compare_methods_end_to_en0/ebm_importance-failed-diff.png\n  Tolerance: \n    0.0001' is None

neat_ml/tests/test_feature_importance.py:171: AssertionError
======================================================================================== warnings summary ========================================================================================
neat_ml/tests/test_analysis.py: 30 warnings
  /Users/treddy/LANL/gitlab/ldrd_neat_ml/neat_ml/analysis/data_analysis.py:410: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
    nbr_dists = np.fromiter((d["distance"] for _, _, d in

neat_ml/tests/test_detection.py::test_detect_single_image_no_blobs
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/numpy/lib/_nanfunctions_impl.py:1241: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)

neat_ml/tests/test_lib.py: 15 warnings
neat_ml/tests/test_feature_importance.py: 729 warnings
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/utils/validation.py:2691: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
    warnings.warn(

neat_ml/tests/test_bubblesam.py: 89 warnings
neat_ml/tests/test_workflow.py: 59 warnings
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/torch/jit/_script.py:1480: DeprecationWarning: `torch.jit.script` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

neat_ml/tests/test_workflow.py::test_stage_detect_pipeline_runs[ds0-paths0-exp_columns0]
neat_ml/tests/test_bubblesam.py::test_bubblesam_detection_generates_pngs[cpu]
neat_ml/tests/test_bubblesam.py::test_sam_internal_api[mps]
neat_ml/tests/test_workflow.py::test_run_workflow_single_image_path[bubblesam-cpu-paths0-masks_filtered]
neat_ml/tests/test_workflow.py::test_run_workflow_single_image_path[bubblesam-gpu-paths1-masks_filtered]
neat_ml/tests/test_bubblesam.py::test_bubblesam_detection_generates_pngs[mps]
neat_ml/tests/test_bubblesam.py::test_run_bubblesam[cpu]
neat_ml/tests/test_bubblesam.py::test_run_bubblesam[gpu]
neat_ml/tests/test_bubblesam.py::test_sam_internal_api[cpu]
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sam2/sam2_image_predictor.py:431: UserWarning: cannot import name '_C' from 'sam2' (/Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sam2/__init__.py)
  
  Skipping the post-processing step due to the error above. You can still use SAM 2 and it's OK to ignore the error above, although some post-processing functionality may be limited (which doesn't affect the results in most cases; see https://github.com/facebookresearch/sam2/blob/main/INSTALL.md).
    masks = self._transforms.postprocess_masks(

neat_ml/tests/test_workflow.py::test_stage_detect_pipeline_runs[ds0-paths0-exp_columns0]
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/joblib/memory.py:607: UserWarning: Persisting input arguments took 1.23s to run.If this happens often in your code, it can cause performance problems (results will be correct in all cases). The reason for this is probably some large input arguments for a wrapped function.
    return self._cached_call(args, kwargs, shelving=False)[0]

neat_ml/tests/test_workflow.py::test_stage_train_model_column_mismatch
neat_ml/tests/test_workflow.py::test_stage_train_model_happy_path_saves_bundle_and_roc
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/utils/validation.py:2684: UserWarning: X has feature names, but SimpleImputer was fitted without feature names
    warnings.warn(

neat_ml/tests/test_workflow.py: 5099 warnings
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/utils/validation.py:2691: UserWarning: X does not have valid feature names, but SimpleImputer was fitted with feature names
    warnings.warn(

neat_ml/tests/test_workflow.py::test_stage_explain_aligns_features_and_calls_compare_methods
  /Users/treddy/LANL/gitlab/ldrd_neat_ml/neat_ml/model/feature_importance.py:74: FutureWarning: The NumPy global RNG was seeded by calling `np.random.seed`. In a future version this function will no longer use the global RNG. Pass `rng` explicitly to opt-in to the new behaviour and silence this warning.
    shap.summary_plot(vals, features=X, max_display=top, show=False, rng=rng)

neat_ml/tests/test_workflow.py::test_stage_run_inference_calls_inference_and_makes_pred_dir
neat_ml/tests/test_workflow.py::test_stage_run_inference_calls_inference_and_makes_pred_dir
  /Users/treddy/python_venvs/py_312_ldrd_neat_dev/lib/python3.12/site-packages/sklearn/base.py:1336: ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (2). Possibly due to duplicate points in X.
    return fit_method(estimator, *args, **kwargs)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================================================== short test summary info =====================================================================================
FAILED neat_ml/tests/test_feature_importance.py::test_compare_methods_end_to_end - AssertionError: assert 'Error: Image files did not match.\n  RMS Value: 9.395480684439018\n  Expected:  \n    /Users/treddy/LANL/gitlab/ldrd_...f-treddy/pytest-15/popen-gw2/test_compare_met...
==================================================================== 1 failed, 171 passed, 2 skipped, 6037 warnings in 46.57s ====================================================================



def _run_shap(
model, X: pd.DataFrame,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why this was marked as resolved since it isn't, I'll reopen it...

It would be good if resolutions were checked and explained.

second dimension contains probabilities for the positive class.
X : pandas.DataFrame
Numeric feature matrix used both as background data for the explainer
and as the evaluation set whose SHAP values are summarized.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is confusing. Is X the design matrix used for training the estimators or something else? Not clear.

What is an "evaluation set?" Is that different from training data? Often we use feature importance techniques on the training data, but I'm finding this description not particularly clear...

train_dataset_config: dict[str, Any],
paths: dict[str, Path],
model_path: Path,
target: str = "Phase_Separation",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no way to set number of top features to use from this public function?

infer_dataset_config: dict[str, Any],
paths: dict[str, Path],
model_path: Path,
steps: list[str],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this take any str or just a Literal of a few possible string options?

The path to the trained model file.
steps : list[str]
A list of active workflow steps to determine
whether to run inference, plotting, or both.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's only a small finite/literal set of string options here, right?

Comment thread README.md
Detection and analysis must be run for every dataset to be used for training, validation and inference. For running the `train`, `infer`, `explain` and `plot` steps, a separate `dataset: -id:` must be used for each input dataset with the appropriate `role` for each dataset, i.e. `train`, `val` or `infer`. Paths for saving the model, training/inference results can be set with `root: model` and `root: results` respectively, and `inference_model` can be set to explicitly provide the path to the trained model when performing inference separately from training.

The user can also determine whether or not to perform machine learning classifier hyperparameter optimization via exhaustive grid search by setting the `ml_hyper_opt` to True or False (the default is True if no parameter is specified.)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be sensible to let them control estimator concurrency

Comment thread run_workflow.py
val_ds = val_list[0]
train_id = train_ds.get("id")
trained_model = Path(model_path) / f"{train_id}_model.joblib"
if not trained_model.exists():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, in stage_train_model module it says # check to see if the model path already exists, if so, skip re-training; which one is correct?

@tylerjereddy
Copy link
Copy Markdown
Collaborator

Also, for this PR, gh-4 (as emphasized at #4 (comment)) and elsewhere, please disclose any scenarios where AI was used to write test cases (or anything else). There is a general "feeling" of the tests being verbose and repetitive instead of being crafted with care.

It may also be that Nidhin did that initially, or a symptom of rushed copy pasting (neither great)--either way, the quality control has been quite time consuming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants