fix/sanitize for json cycle detection by himax12 · Pull Request #240 · sktime/sktime-mcp

himax12 · 2026-04-18T07:33:25Z

fix: add coverage parameter to fit_predict and fit_predict_async
feat: add compare_estimators tool for agentic model selection
docs: update installation docs to recommend uvx
fix: add cycle detection to sanitize_for_json

This fix addresses issue sktime#176 where the coverage parameter was silently dropped in async workflows, preventing prediction intervals from being returned. Changes: - Add predict_interval() method to Executor class that uses sktime's predict_interval() for probabilistic forecasting - Add coverage parameter to Executor.fit_predict() - when provided, uses predict_interval() instead of predict() - Add coverage parameter to Executor.fit_predict_async() - same behavior - Update fit_predict_tool() to accept and pass coverage parameter - Update fit_predict_async_tool() to accept and pass coverage parameter - Update MCP server tool schemas for fit_predict and fit_predict_async to expose coverage parameter in the API The coverage parameter accepts: - A single float (e.g., 0.9 for 90% prediction intervals) - A list of floats for multiple interval levels (e.g., [0.5, 0.9, 0.95]) Only works with estimators that support prediction intervals (check capability:pred_int tag).

This implements issue sktime#178 - a new compare_estimators MCP tool that allows LLM agents to automatically compare multiple models on the same dataset and select the best one. Features: - Compare multiple estimator handles using cross-validation - Support for MAE, MAPE, MSE, RMSE, SMAPE, MASE, MedAE metrics - Works with both demo datasets and custom data_handle - Returns ranked results with best estimator The tool enables the full agentic loop: extract_ts_metadata() <- know what data looks like instantiate_estimator() x N <- create candidate models compare_estimators() <- NEW: rank all candidates automatically fit_predict(best_handle) <- execute with the winner Implementation: - Added _get_metric_instance() helper to map metric names to sktime classes - Added _extract_metric_value() to extract metrics from evaluate results - Added compare_estimators_tool() function - Registered tool in server.py with full MCP schema

This implements the feature request to use uvx as the recommended installation method for sktime-mcp. Changes: - README.md: Lead with uvx as the recommended approach, update Quick Start and Beginner Setup sections - docs/source/intro.md: Update Quick Start to lead with uvx - docs/source/user-guide.md: Update Getting Started to lead with uvx uvx provides a no-install experience similar to npx for Node.js, automatically handling environment isolation and dependencies. This removes the manual venv creation step from user-facing docs.

RecursionError from circular references in sktime estimators Problem: sanitize_for_json recursively walked objects but had no cycle detection. If an sktime estimator contains circular references in its attributes (parent/child pointers), this causes RecursionError and crashes the MCP server. Solution: - Track visited objects by id() in a _seen set - When a cycle is detected (id already seen), return str(obj) instead - Also recurse into __dict__ to catch cycles in object attributes (previously we just converted to string without traversing) This handles: - Self-referential dicts: d['key'] = d - Nested cycles: d['a']['b'] = d - Object parent pointers: a.parent = b; b.parent = a

Copilot

Pull request overview

This PR expands the sktime-mcp tool surface to support prediction-interval workflows, introduces an estimator-comparison tool for agentic model selection, updates installation docs to recommend uvx, and hardens JSON sanitization by adding cycle detection.

Changes:

Add coverage support to fit_predict / fit_predict_async to request prediction intervals.
Add compare_estimators tool that runs CV-based evaluation and ranks candidate estimators.
Update docs/README to recommend running via uvx and bump documented Python prerequisite to 3.10+.
Add cycle detection to sanitize_for_json to avoid recursion errors on circular references.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`src/sktime_mcp/tools/fit_predict.py`	Adds `coverage` parameter plumbing and updates tool docstrings/examples.
`src/sktime_mcp/tools/evaluate.py`	Adds metric helpers and new `compare_estimators_tool` implementation.
`src/sktime_mcp/server.py`	Registers new tool, threads `coverage` through schemas/calls, and adds cycle detection to JSON sanitization.
`src/sktime_mcp/runtime/executor.py`	Adds `predict_interval` plus `coverage` support in sync/async fit-predict flows.
`docs/source/user-guide.md`	Updates installation guidance to recommend `uvx` and Python 3.10+.
`docs/source/intro.md`	Updates “getting started” instructions to highlight `uvx`.
`README.md`	Reworks install/quickstart instructions around `uvx` and pip fallback.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T07:38:57Z

+def compare_estimators_tool(
+    estimator_handles: list[str],
+    dataset: Optional[str] = None,
+    data_handle: Optional[str] = None,
+    metric: str = "MAPE",
+    cv_folds: int = 3,
+    horizon: int = 12,
+) -> dict[str, Any]:


New compare_estimators_tool behavior (metric parsing, CV splitter construction, ranking, and error aggregation) isn’t covered by tests. Since evaluate_estimator_tool already has coverage in tests/test_evaluate.py, please add targeted tests for compare_estimators_tool (including default metric/horizon path and a small list of handles).

Copilot · 2026-04-18T07:38:57Z

+            return {
+                "success": True,
+                "predictions": pred_intervals_copy.to_dict(),
+                "intervals": intervals_dict,
+                "horizon": len(fh) if hasattr(fh, "__len__") else fh,
+                "coverage": coverage,
+            }


predict_interval() returns the interval DataFrame (lower/upper) under the predictions key (pred_intervals_copy.to_dict()), but predict_interval in sktime does not return point forecasts. This breaks the existing API contract where predictions is the point forecast (and will also make fit_predict(..., coverage=...) return intervals instead of point predictions). Consider returning point forecasts from instance.predict(...) in predictions, and put interval data only under intervals (or rename the interval payload to avoid ambiguity).

Copilot · 2026-04-18T07:38:57Z

+    if _seen is None:
+        _seen = set()
+
+    obj_id = id(obj)
+    if obj_id in _seen:
+        return str(obj)
+    _seen.add(obj_id)


Cycle detection currently tracks all objects (including None, bool, small ints, and interned strings). Because these share stable identities, repeated values (e.g. [1, 1] or [None, None]) will be treated as cycles and coerced to str(obj), changing types/values (1 -> "1", None -> "None"). Only add objects to _seen for container / user-defined objects that can actually participate in reference cycles (dict/list/tuple/custom objects), and skip primitives/immutables when doing the id()/_seen check.

Copilot · 2026-04-18T07:38:57Z

@@ -27,83 +27,90 @@ This MCP is **not** just documentation or static code analysis. It is a **semant
 ## 🛠️ Prerequisites

 - **Python 3.10+** (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)


README says “3.9 is listed in pyproject.toml”, but pyproject.toml now sets requires-python = ">=3.10". Please update/remove the parenthetical so prerequisites stay accurate.

Suggested change

- **Python 3.10+** (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)

- **Python 3.10+**

Copilot · 2026-04-18T07:38:58Z

+    # Convert column names to find the matching metric
+    metric_col = None
+    for col in results.columns:
+        if col == metric_name or col == f"test_{metric_name}":
+            metric_col = col
+            break
+        # Check case-insensitive match
+        col_lower = col.lower()
+        metric_lower = metric_name.lower()
+        if metric_lower in col_lower or col_lower == f"test_{metric_lower}":
+            metric_col = col
+            break


_extract_metric_value() tries to match acronyms like MAPE/MAE against evaluate() result columns, but the columns are typically named like test_MeanAbsolutePercentageError (see existing tests/test_evaluate.py). As a result, compare_estimators_tool() will fail to extract the default MAPE score and report errors for all estimators. Consider mapping acronyms to the actual scorer name (e.g., use scoring.name/type(scoring).__name__ or a dict of acronym -> full column name) and then look up test_<scorer_name> explicitly.

Copilot · 2026-04-18T07:38:58Z

+        >>> fit_predict_tool("est_abc123", "airline", horizon=12, coverage=0.9)
+        {
+            "success": True,
+            "predictions": {...},
+            "intervals": {"Airline": {"coverage": 0.9, "lower": {...}, "upper": {...}}},
+            "horizon": 12,
+            "coverage": 0.9


The fit_predict_tool doc example for intervals doesn’t match the current Executor.predict_interval() output shape. predict_interval() builds keys like f"{var_name}_{cov_level}" (e.g., Airline_0.9), not just Airline. Please update the docstring example (and/or align the implementation to the documented structure) so users know what to expect.

Copilot · 2026-04-18T07:38:58Z

+        # Use predict_interval if coverage is provided, otherwise use predict
+        if coverage is not None:
+            return self.predict_interval(handle_id, fh=fh, X=X, coverage=coverage)
        return self.predict(handle_id, fh=fh, X=X)


The new coverage path (fit_predict -> predict_interval and async equivalent) isn’t covered by tests. There are already tests for fit_predict_async in tests/test_background_jobs.py; please add cases that pass coverage and assert the response includes point forecasts plus interval bounds (and that unsupported estimators return the expected error).

Copilot · 2026-04-18T07:38:58Z


 import logging
-from typing import Any
+from typing import Any, Optional, Union


Union is imported but not used in this module, which will fail ruff check (F401). Remove the unused import or use it in a type annotation.

Suggested change

from typing import Any, Optional, Union

from typing import Any, Optional

Copilot · 2026-04-18T07:38:58Z

+    if not dataset and not data_handle:
+        return {
+            "success": False,
+            "error": "Either dataset or data_handle must be provided"
+        }


compare_estimators_tool documents “use dataset or data_handle, not both”, but the validation only errors when neither is provided. Add an explicit check to reject requests where both dataset and data_handle are set to avoid ambiguous behavior.

Copilot · 2026-04-18T07:38:59Z

+            eval_results = evaluate(
+                forecaster=instance,
+                y=y,
+                X=X,
+                cv=cv,
+                scoring=scoring,
+            )


evaluate() will fit the provided forecaster during CV, which mutates the estimator instance stored under the handle. That means running compare_estimators_tool can unexpectedly change/overwrite a user’s live estimator state. Prefer cloning the forecaster per evaluation (e.g., via sktime.base.clone/sklearn.base.clone) so comparisons are side‑effect free.

himax12 added 4 commits April 18, 2026 12:22

Copilot AI review requested due to automatic review settings April 18, 2026 07:33

Copilot started reviewing on behalf of himax12 April 18, 2026 07:33 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix/sanitize for json cycle detection#240

fix/sanitize for json cycle detection#240
himax12 wants to merge 4 commits intosktime:mainfrom
himax12:fix/sanitize-for-json-cycle-detection

himax12 commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -27,83 +27,90 @@ This MCP is not just documentation or static code analysis. It is a **semant
		## 🛠️ Prerequisites

		- Python 3.10+ (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)

	- Python 3.10+ (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)
	- Python 3.10+

	from typing import Any, Optional, Union
	from typing import Any, Optional

Conversation

himax12 commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants