Add input validation for Friedman-Popescu H statistic calculation by Copilot · Pull Request #16760 · h2oai/h2o-3

Copilot · 2026-02-04T15:50:51Z

getFriedmanPopescusH() throws NPE when called with non-numeric or non-existent column names. The error occurs in FriedmanPopescusH.filterFrame() when frame.vecs(cols) returns null for invalid columns.

# This produces a cryptic NPE
h = titanic_xgb.h(train, ['fair', 'age'])  # 'fair' is categorical
# Error: Cannot read field "_key" because "vecs[i]" is null

Changes

Added default validation method in FriedmanPopescusHCollector interface
- Validates vars is non-null and non-empty
- Validates all columns exist and are numeric
- Validates columns contain non-constant values
Updated implementations to call validation before processing
- GBMModel.getFriedmanPopescusH()
- XGBoostModel.getFriedmanPopescusH()
Added comprehensive test coverage
- Java: GBMTest.testFriedmanPopescusHInputValidation()
- Python: pyunit_h_input_validation.py (tests both GBM and XGBoost)
- R: runit_GBM_h_statistic_input_validation.R and runit_XGBoost_h_statistic_input_validation.R

Error messages now clearly indicate the issue:

Calculating H statistics error: column 'fair' is not numeric. 
H statistics can only be calculated for numeric variables.

Original prompt

This section details on the original issue you should resolve

<issue_title>H statistic: Improve input validation</issue_title>
<issue_description>This code throws NPE:

from h2o.estimators import H2OXGBoostEstimator

# Import the titanic dataset into H2O:
titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv")

# Set the predictors and response; set the response as a factor:
titanic["survived"] = titanic["survived"].asfactor()
predictors = titanic.columns
response = "survived"

# Split the dataset into a train and valid set:
train, valid = titanic.split_frame(ratios=[.8], seed=1234)

# Build and train the model:
titanic_xgb = H2OXGBoostEstimator(booster='dart',
                                  normalize_type="tree",
                                  seed=1234)
titanic_xgb.train(x=predictors,
                  y=response,
                  training_frame=train,
                  validation_frame=valid)

# Eval performance:
perf = titanic_xgb.model_performance()

# Generate predictions on a test set (if necessary):
pred = titanic_xgb.predict(valid)

# Extract feature interactions:
feature_interactions = titanic_xgb.feature_interaction()

# Get Friedman and Popescu's H statistics
h = titanic_xgb.h(train, ['fair','age'])
print(h)

Server error java.lang.NullPointerException:
  Error: Cannot read field "_key" because "vecs[i]" is null
  Request: None

The input parameter validation should be done properly. </issue_description>

<agent_instructions>Validate input parameters of the Friedman Popescus H statistics. On the Java Backend, find the getFriedmanPopescusH() method, where the implementation is for the XGBoost and GBM algorithms. </agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes H statistic: Improve input validation #16759

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: maurever <11465784+maurever@users.noreply.github.com>

maurever

R and Python tests are missing.

Agent-Logs-Url: https://github.com/h2oai/h2o-3/sessions/832aaf6a-935f-45c8-9ea7-714da906f754 Co-authored-by: maurever <11465784+maurever@users.noreply.github.com>

Copilot · 2026-04-14T16:53:36Z

R and Python tests are missing.

Added Python and R tests for input validation in commit 3365711.

Python test: h2o-py/tests/testdir_hstats/pyunit_h_input_validation.py

Tests null/None, empty, non-existent, and non-numeric column parameters
Validates both GBM and XGBoost models

R tests:

h2o-r/tests/testdir_algos/gbm/runit_GBM_h_statistic_input_validation.R (GBM)
h2o-r/tests/testdir_algos/xgboost/runit_XGBoost_h_statistic_input_validation.R (XGBoost)
Both test the same validation scenarios

All tests verify proper error messages for invalid input and correct behavior for valid input.

sonarqubecloud · 2026-04-14T16:56:31Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

maurever

LGTM. @tomasfryda or @valenad1, could you review this PR, please?

valenad1

It should go to rel-3.46.0 if you want it in next release and fix check pull request test stage please. :)

Initial plan

c9c1711

Copilot AI assigned Copilot and maurever Feb 4, 2026

Copilot started work on behalf of maurever February 4, 2026 15:51 View session

Copilot AI and others added 2 commits February 4, 2026 15:54

Add input validation for H statistic and tests

633d62e

Co-authored-by: maurever <11465784+maurever@users.noreply.github.com>

Refactor: Extract validation logic to shared method

be1fea4

Co-authored-by: maurever <11465784+maurever@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Improve input validation for H statistic~~ Add input validation for Friedman-Popescu H statistic calculation Feb 4, 2026

Copilot AI requested a review from maurever February 4, 2026 16:00

Copilot finished work on behalf of maurever February 4, 2026 16:00

maurever requested changes Apr 14, 2026

View reviewed changes

Copilot started work on behalf of maurever April 14, 2026 16:50 View session

maurever requested a review from tomasfryda April 14, 2026 16:51

Add Python and R tests for H statistic input validation

3365711

Agent-Logs-Url: https://github.com/h2oai/h2o-3/sessions/832aaf6a-935f-45c8-9ea7-714da906f754 Co-authored-by: maurever <11465784+maurever@users.noreply.github.com>

Copilot finished work on behalf of maurever April 14, 2026 16:54

Copilot AI requested a review from maurever April 14, 2026 16:54

maurever approved these changes Apr 21, 2026

View reviewed changes

maurever marked this pull request as ready for review April 21, 2026 10:53

valenad1 requested changes Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add input validation for Friedman-Popescu H statistic calculation#16760

Add input validation for Friedman-Popescu H statistic calculation#16760
Copilot wants to merge 4 commits intomasterfrom
copilot/improve-input-validation

Copilot AI commented Feb 4, 2026 •

edited

Loading

Uh oh!

maurever left a comment

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

sonarqubecloud Bot commented Apr 14, 2026

Uh oh!

maurever left a comment

Uh oh!

valenad1 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Comments on the Issue (you are @copilot in this section)

Uh oh!

maurever left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Apr 14, 2026

Uh oh!

sonarqubecloud Bot commented Apr 14, 2026

Quality Gate passed

Uh oh!

maurever left a comment

Choose a reason for hiding this comment

Uh oh!

valenad1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 4, 2026 •

edited

Loading