Skip to content

fix/sanitize for json cycle detection#240

Open
himax12 wants to merge 4 commits intosktime:mainfrom
himax12:fix/sanitize-for-json-cycle-detection
Open

fix/sanitize for json cycle detection#240
himax12 wants to merge 4 commits intosktime:mainfrom
himax12:fix/sanitize-for-json-cycle-detection

Conversation

@himax12
Copy link
Copy Markdown

@himax12 himax12 commented Apr 18, 2026

  • fix: add coverage parameter to fit_predict and fit_predict_async
  • feat: add compare_estimators tool for agentic model selection
  • docs: update installation docs to recommend uvx
  • fix: add cycle detection to sanitize_for_json

himax12 added 4 commits April 18, 2026 12:22
This fix addresses issue sktime#176 where the coverage parameter was silently
dropped in async workflows, preventing prediction intervals from being
returned.

Changes:
- Add predict_interval() method to Executor class that uses sktime's
  predict_interval() for probabilistic forecasting
- Add coverage parameter to Executor.fit_predict() - when provided,
  uses predict_interval() instead of predict()
- Add coverage parameter to Executor.fit_predict_async() - same behavior
- Update fit_predict_tool() to accept and pass coverage parameter
- Update fit_predict_async_tool() to accept and pass coverage parameter
- Update MCP server tool schemas for fit_predict and fit_predict_async
  to expose coverage parameter in the API

The coverage parameter accepts:
- A single float (e.g., 0.9 for 90% prediction intervals)
- A list of floats for multiple interval levels (e.g., [0.5, 0.9, 0.95])

Only works with estimators that support prediction intervals (check
capability:pred_int tag).
This implements issue sktime#178 - a new compare_estimators MCP tool that allows
LLM agents to automatically compare multiple models on the same dataset
and select the best one.

Features:
- Compare multiple estimator handles using cross-validation
- Support for MAE, MAPE, MSE, RMSE, SMAPE, MASE, MedAE metrics
- Works with both demo datasets and custom data_handle
- Returns ranked results with best estimator

The tool enables the full agentic loop:
  extract_ts_metadata()       <- know what data looks like
  instantiate_estimator() x N <- create candidate models
  compare_estimators()         <- NEW: rank all candidates automatically
  fit_predict(best_handle)     <- execute with the winner

Implementation:
- Added _get_metric_instance() helper to map metric names to sktime classes
- Added _extract_metric_value() to extract metrics from evaluate results
- Added compare_estimators_tool() function
- Registered tool in server.py with full MCP schema
This implements the feature request to use uvx as the recommended
installation method for sktime-mcp.

Changes:
- README.md: Lead with uvx as the recommended approach, update Quick Start
  and Beginner Setup sections
- docs/source/intro.md: Update Quick Start to lead with uvx
- docs/source/user-guide.md: Update Getting Started to lead with uvx

uvx provides a no-install experience similar to npx for Node.js,
automatically handling environment isolation and dependencies.

This removes the manual venv creation step from user-facing docs.
RecursionError from circular references in sktime estimators

Problem:
sanitize_for_json recursively walked objects but had no cycle detection.
If an sktime estimator contains circular references in its attributes
(parent/child pointers), this causes RecursionError and crashes the MCP server.

Solution:
- Track visited objects by id() in a _seen set
- When a cycle is detected (id already seen), return str(obj) instead
- Also recurse into __dict__ to catch cycles in object attributes
  (previously we just converted to string without traversing)

This handles:
- Self-referential dicts: d['key'] = d
- Nested cycles: d['a']['b'] = d
- Object parent pointers: a.parent = b; b.parent = a
Copilot AI review requested due to automatic review settings April 18, 2026 07:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the sktime-mcp tool surface to support prediction-interval workflows, introduces an estimator-comparison tool for agentic model selection, updates installation docs to recommend uvx, and hardens JSON sanitization by adding cycle detection.

Changes:

  • Add coverage support to fit_predict / fit_predict_async to request prediction intervals.
  • Add compare_estimators tool that runs CV-based evaluation and ranks candidate estimators.
  • Update docs/README to recommend running via uvx and bump documented Python prerequisite to 3.10+.
  • Add cycle detection to sanitize_for_json to avoid recursion errors on circular references.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/sktime_mcp/tools/fit_predict.py Adds coverage parameter plumbing and updates tool docstrings/examples.
src/sktime_mcp/tools/evaluate.py Adds metric helpers and new compare_estimators_tool implementation.
src/sktime_mcp/server.py Registers new tool, threads coverage through schemas/calls, and adds cycle detection to JSON sanitization.
src/sktime_mcp/runtime/executor.py Adds predict_interval plus coverage support in sync/async fit-predict flows.
docs/source/user-guide.md Updates installation guidance to recommend uvx and Python 3.10+.
docs/source/intro.md Updates “getting started” instructions to highlight uvx.
README.md Reworks install/quickstart instructions around uvx and pip fallback.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +163 to +170
def compare_estimators_tool(
estimator_handles: list[str],
dataset: Optional[str] = None,
data_handle: Optional[str] = None,
metric: str = "MAPE",
cv_folds: int = 3,
horizon: int = 12,
) -> dict[str, Any]:
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New compare_estimators_tool behavior (metric parsing, CV splitter construction, ranking, and error aggregation) isn’t covered by tests. Since evaluate_estimator_tool already has coverage in tests/test_evaluate.py, please add targeted tests for compare_estimators_tool (including default metric/horizon path and a small list of handles).

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +252
return {
"success": True,
"predictions": pred_intervals_copy.to_dict(),
"intervals": intervals_dict,
"horizon": len(fh) if hasattr(fh, "__len__") else fh,
"coverage": coverage,
}
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predict_interval() returns the interval DataFrame (lower/upper) under the predictions key (pred_intervals_copy.to_dict()), but predict_interval in sktime does not return point forecasts. This breaks the existing API contract where predictions is the point forecast (and will also make fit_predict(..., coverage=...) return intervals instead of point predictions). Consider returning point forecasts from instance.predict(...) in predictions, and put interval data only under intervals (or rename the interval payload to avoid ambiguity).

Copilot uses AI. Check for mistakes.
Comment thread src/sktime_mcp/server.py
Comment on lines +84 to +90
if _seen is None:
_seen = set()

obj_id = id(obj)
if obj_id in _seen:
return str(obj)
_seen.add(obj_id)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cycle detection currently tracks all objects (including None, bool, small ints, and interned strings). Because these share stable identities, repeated values (e.g. [1, 1] or [None, None]) will be treated as cycles and coerced to str(obj), changing types/values (1 -> "1", None -> "None"). Only add objects to _seen for container / user-defined objects that can actually participate in reference cycles (dict/list/tuple/custom objects), and skip primitives/immutables when doing the id()/_seen check.

Copilot uses AI. Check for mistakes.
Comment thread README.md
@@ -27,83 +27,90 @@ This MCP is **not** just documentation or static code analysis. It is a **semant
## 🛠️ Prerequisites

- **Python 3.10+** (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README says “3.9 is listed in pyproject.toml”, but pyproject.toml now sets requires-python = ">=3.10". Please update/remove the parenthetical so prerequisites stay accurate.

Suggested change
- **Python 3.10+** (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)
- **Python 3.10+**

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +92
# Convert column names to find the matching metric
metric_col = None
for col in results.columns:
if col == metric_name or col == f"test_{metric_name}":
metric_col = col
break
# Check case-insensitive match
col_lower = col.lower()
metric_lower = metric_name.lower()
if metric_lower in col_lower or col_lower == f"test_{metric_lower}":
metric_col = col
break
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_metric_value() tries to match acronyms like MAPE/MAE against evaluate() result columns, but the columns are typically named like test_MeanAbsolutePercentageError (see existing tests/test_evaluate.py). As a result, compare_estimators_tool() will fail to extract the default MAPE score and report errors for all estimators. Consider mapping acronyms to the actual scorer name (e.g., use scoring.name/type(scoring).__name__ or a dict of acronym -> full column name) and then look up test_<scorer_name> explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +57
>>> fit_predict_tool("est_abc123", "airline", horizon=12, coverage=0.9)
{
"success": True,
"predictions": {...},
"intervals": {"Airline": {"coverage": 0.9, "lower": {...}, "upper": {...}}},
"horizon": 12,
"coverage": 0.9
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fit_predict_tool doc example for intervals doesn’t match the current Executor.predict_interval() output shape. predict_interval() builds keys like f"{var_name}_{cov_level}" (e.g., Airline_0.9), not just Airline. Please update the docstring example (and/or align the implementation to the documented structure) so users know what to expect.

Copilot uses AI. Check for mistakes.
Comment on lines +300 to 303
# Use predict_interval if coverage is provided, otherwise use predict
if coverage is not None:
return self.predict_interval(handle_id, fh=fh, X=X, coverage=coverage)
return self.predict(handle_id, fh=fh, X=X)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new coverage path (fit_predict -> predict_interval and async equivalent) isn’t covered by tests. There are already tests for fit_predict_async in tests/test_background_jobs.py; please add cases that pass coverage and assert the response includes point forecasts plus interval bounds (and that unsupported estimators return the expected error).

Copilot uses AI. Check for mistakes.

import logging
from typing import Any
from typing import Any, Optional, Union
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union is imported but not used in this module, which will fail ruff check (F401). Remove the unused import or use it in a type annotation.

Suggested change
from typing import Any, Optional, Union
from typing import Any, Optional

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
if not dataset and not data_handle:
return {
"success": False,
"error": "Either dataset or data_handle must be provided"
}
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare_estimators_tool documents “use dataset or data_handle, not both”, but the validation only errors when neither is provided. Add an explicit check to reject requests where both dataset and data_handle are set to avoid ambiguous behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +287
eval_results = evaluate(
forecaster=instance,
y=y,
X=X,
cv=cv,
scoring=scoring,
)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluate() will fit the provided forecaster during CV, which mutates the estimator instance stored under the handle. That means running compare_estimators_tool can unexpectedly change/overwrite a user’s live estimator state. Prefer cloning the forecaster per evaluation (e.g., via sktime.base.clone/sklearn.base.clone) so comparisons are side‑effect free.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants