docs/uvx recommended setup by himax12 · Pull Request #237 · sktime/sktime-mcp

himax12 · 2026-04-18T07:21:50Z

fix: add coverage parameter to fit_predict and fit_predict_async
feat: add compare_estimators tool for agentic model selection
docs: update installation docs to recommend uvx

This fix addresses issue sktime#176 where the coverage parameter was silently dropped in async workflows, preventing prediction intervals from being returned. Changes: - Add predict_interval() method to Executor class that uses sktime's predict_interval() for probabilistic forecasting - Add coverage parameter to Executor.fit_predict() - when provided, uses predict_interval() instead of predict() - Add coverage parameter to Executor.fit_predict_async() - same behavior - Update fit_predict_tool() to accept and pass coverage parameter - Update fit_predict_async_tool() to accept and pass coverage parameter - Update MCP server tool schemas for fit_predict and fit_predict_async to expose coverage parameter in the API The coverage parameter accepts: - A single float (e.g., 0.9 for 90% prediction intervals) - A list of floats for multiple interval levels (e.g., [0.5, 0.9, 0.95]) Only works with estimators that support prediction intervals (check capability:pred_int tag).

This implements issue sktime#178 - a new compare_estimators MCP tool that allows LLM agents to automatically compare multiple models on the same dataset and select the best one. Features: - Compare multiple estimator handles using cross-validation - Support for MAE, MAPE, MSE, RMSE, SMAPE, MASE, MedAE metrics - Works with both demo datasets and custom data_handle - Returns ranked results with best estimator The tool enables the full agentic loop: extract_ts_metadata() <- know what data looks like instantiate_estimator() x N <- create candidate models compare_estimators() <- NEW: rank all candidates automatically fit_predict(best_handle) <- execute with the winner Implementation: - Added _get_metric_instance() helper to map metric names to sktime classes - Added _extract_metric_value() to extract metrics from evaluate results - Added compare_estimators_tool() function - Registered tool in server.py with full MCP schema

This implements the feature request to use uvx as the recommended installation method for sktime-mcp. Changes: - README.md: Lead with uvx as the recommended approach, update Quick Start and Beginner Setup sections - docs/source/intro.md: Update Quick Start to lead with uvx - docs/source/user-guide.md: Update Getting Started to lead with uvx uvx provides a no-install experience similar to npx for Node.js, automatically handling environment isolation and dependencies. This removes the manual venv creation step from user-facing docs.

Copilot

Pull request overview

This PR extends the sktime-mcp server with prediction-interval support in fit/predict workflows, introduces a new tool to compare multiple estimators via cross-validation for agentic model selection, and updates documentation to recommend uvx for installation/running.

Changes:

Add coverage support to fit_predict / fit_predict_async to request prediction intervals.
Add compare_estimators tool for cross-validated ranking of multiple estimator handles.
Update README + docs to recommend uvx-based usage and refresh install/run instructions.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
src/sktime_mcp/tools/fit_predict.py	Adds `coverage` parameter to tool APIs and updates docstring examples/return contract.
src/sktime_mcp/tools/evaluate.py	Adds metric mapping/helpers and new `compare_estimators_tool` implementation.
src/sktime_mcp/server.py	Registers `compare_estimators` tool and exposes `coverage` in tool schemas + dispatcher.
src/sktime_mcp/runtime/executor.py	Adds `predict_interval` and threads `coverage` through fit→predict sync/async paths.
docs/source/user-guide.md	Recommends `uvx` usage and updates client configuration examples.
docs/source/intro.md	Updates quickstart to recommend `uvx` and shows example client configs.
README.md	Reworks installation/running guidance around `uvx` and updates prerequisites text.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T07:27:37Z

 import logging
-from typing import Any
+from typing import Any, Optional, Union

 from sktime.forecasting.model_evaluation import evaluate
 from sktime.forecasting.model_selection import ExpandingWindowSplitter
+from sktime.performance_metrics.forecasting import (
+    MeanAbsoluteError,
+    MeanAbsolutePercentageError,
+    MeanSquaredError,
+    MedianAbsoluteError,
+    GeometricMeanAbsolutePercentageError,
+)


Union is imported but unused in this module, which will be flagged by linting (pyflakes/ruff F401). Remove the unused import (or use it if intended).

Copilot · 2026-04-18T07:27:37Z

+        initial_window = max(int(n * 0.5), n - cv_folds * 2)
+        if initial_window < 1:
+            initial_window = 1
+        cv = ExpandingWindowSplitter(
+            initial_window=initial_window,
+            step_length=1,


The CV splitter setup can easily produce 0 folds (or raise) for typical inputs because initial_window is computed without accounting for horizon/max(fh). For example, with small n or larger horizon, initial_window + max(fh) can exceed n, so evaluate(...) returns empty results or errors. Adjust initial_window/step_length so that the requested cv_folds is feasible given n and horizon (and fail fast with a clear error when it isn’t).

Suggested change

initial_window = max(int(n * 0.5), n - cv_folds * 2)

if initial_window < 1:

initial_window = 1

cv = ExpandingWindowSplitter(

initial_window=initial_window,

step_length=1,

if horizon < 1:

return {

"success": False,

"error": "horizon must be at least 1"

}

if cv_folds < 1:

return {

"success": False,

"error": "cv_folds must be at least 1"

}

max_fh = horizon

max_possible_folds = n - max_fh

if max_possible_folds < 1:

return {

"success": False,

"error": (

f"Time series is too short for horizon={horizon}: "

f"need at least {horizon + 1} observations, got {n}"

),

}

if cv_folds > max_possible_folds:

return {

"success": False,

"error": (

f"Requested cv_folds={cv_folds} is not feasible for "

f"n={n} and horizon={horizon}; maximum is {max_possible_folds}"

),

}

target_initial_window = max(1, int(n * 0.5))

if cv_folds == 1:

step_length = 1

initial_window = n - max_fh

else:

step_length = max(

1,

(n - max_fh - target_initial_window) // (cv_folds - 1),

)

initial_window = n - max_fh - step_length * (cv_folds - 1)

cv = ExpandingWindowSplitter(

initial_window=initial_window,

step_length=step_length,

Copilot · 2026-04-18T07:27:37Z

+            pred_intervals = instance.predict_interval(fh=fh, X=X, coverage=coverage)
+
+            # Convert to JSON-serializable format
+            # The result is a DataFrame with MultiIndex columns: (variable, coverage, lower/upper)
+            pred_intervals_copy = pred_intervals.copy()
+            pred_intervals_copy.index = pred_intervals_copy.index.astype(str)
+
+            # Extract intervals into a structured format
+            intervals_dict = {}
+            for col in pred_intervals_copy.columns:
+                var_name = col[0] if isinstance(col, tuple) else "predictions"
+                cov_level = col[1] if isinstance(col, tuple) else coverage
+                bound_type = col[2] if isinstance(col, tuple) else "bound"
+
+                key = f"{var_name}_{cov_level}"
+                if key not in intervals_dict:
+                    intervals_dict[key] = {"coverage": cov_level}
+                intervals_dict[key][bound_type] = pred_intervals_copy[col].to_dict()
+
+            return {
+                "success": True,
+                "predictions": pred_intervals_copy.to_dict(),
+                "intervals": intervals_dict,
+                "horizon": len(fh) if hasattr(fh, "__len__") else fh,
+                "coverage": coverage,
+            }


predict_interval returns "predictions": pred_intervals_copy.to_dict(), but pred_intervals_copy contains lower/upper interval columns (often with MultiIndex/tuple column keys). This means callers expecting point forecasts will instead get interval data (and tuple keys will stringify during JSON sanitization). Consider returning point predictions separately (e.g., via instance.predict(...)) and keep interval bounds exclusively under intervals in a stable JSON-friendly structure.

Copilot · 2026-04-18T07:27:38Z

+        >>> fit_predict_tool("est_abc123", "airline", horizon=12, coverage=0.9)
+        {
+            "success": True,
+            "predictions": {...},
+            "intervals": {"Airline": {"coverage": 0.9, "lower": {...}, "upper": {...}}},
+            "horizon": 12,
+            "coverage": 0.9


The fit_predict_tool doc example for intervals doesn’t match the structure returned by Executor.predict_interval (which currently keys intervals by something like <var>_<coverage> and returns interval columns under predictions). This mismatch will confuse tool users; update the example/Return contract (or adjust predict_interval output) so docs reflect the actual JSON shape.

Copilot · 2026-04-18T07:27:38Z

+                        "default": 12,
+                    },
+                },
+                "required": ["estimator_handles"],


compare_estimators tool schema doesn’t require either dataset or data_handle, but compare_estimators_tool fails unless one is provided. Tighten the schema (e.g., oneOf requiring exactly one of the two) so clients get validation errors instead of runtime errors.

Suggested change

"required": ["estimator_handles"],

"required": ["estimator_handles"],

"oneOf": [

{"required": ["dataset"]},

{"required": ["data_handle"]},

],

Copilot · 2026-04-18T07:27:39Z

+def _extract_metric_value(results: Any, metric_name: str) -> Optional[float]:
+    """Extract the metric value from evaluate results.
+
+    The column name format is 'test_<MetricName>' (e.g., 'test_MeanAbsolutePercentageError').
+
+    Args:
+        results: DataFrame from sktime evaluate()
+        metric_name: Name of the metric to extract
+
+    Returns:
+        The metric value as a float, or None if not found
+    """
+    # Convert column names to find the matching metric
+    metric_col = None
+    for col in results.columns:
+        if col == metric_name or col == f"test_{metric_name}":
+            metric_col = col
+            break
+        # Check case-insensitive match
+        col_lower = col.lower()
+        metric_lower = metric_name.lower()
+        if metric_lower in col_lower or col_lower == f"test_{metric_lower}":
+            metric_col = col
+            break


_extract_metric_value is unlikely to work with the metric abbreviations passed by compare_estimators_tool (e.g., "MAPE"). sktime.evaluate() typically emits columns like test_MeanAbsolutePercentageError (see existing test expectations), so searching for "MAPE" as a substring won't match and score will often be None, making comparisons fail. Consider extracting based on the scoring object’s name/class used in evaluate(...) (and/or building an explicit abbrev→full-column mapping) rather than fuzzy substring matching on the abbreviation.

Copilot · 2026-04-18T07:27:39Z

+    if not dataset and not data_handle:
+        return {
+            "success": False,
+            "error": "Either dataset or data_handle must be provided"
+        }
+
+    # Load data
+    if data_handle:
+        if data_handle not in executor._data_handles:
+            return {
+                "success": False,
+                "error": f"Unknown data handle: {data_handle}",
+                "available_handles": list(executor._data_handles.keys()),
+            }
+        data_info = executor._data_handles[data_handle]
+        y = data_info["y"]
+        X = data_info.get("X")
+    else:
+        data_result = executor.load_dataset(dataset)
+        if not data_result["success"]:
+            return data_result
+        y = data_result["data"]
+        X = data_result.get("exog")
+


compare_estimators_tool documentation/schema says "Use this or data_handle, not both", but the implementation allows both and silently prioritizes data_handle. This makes the API ambiguous for callers; it should reject requests where both are provided (or clearly document precedence).

Copilot · 2026-04-18T07:27:39Z

+                        ),
+                    },
                },
                "required": ["estimator_handle"],


fit_predict tool schema marks only estimator_handle as required, but fit_predict_tool requires dataset unless data_handle is provided. Because call_tool currently defaults missing dataset to an empty string, clients can send an invalid request that will fail at runtime. Consider using a oneOf schema to require exactly one of {dataset, data_handle} (and avoid defaulting dataset to "" in the dispatcher).

Suggested change

"required": ["estimator_handle"],

"oneOf": [

{"required": ["estimator_handle", "dataset"]},

{"required": ["estimator_handle", "data_handle"]},

],

Copilot · 2026-04-18T07:27:39Z

@@ -27,83 +27,90 @@ This MCP is **not** just documentation or static code analysis. It is a **semant
 ## 🛠️ Prerequisites

 - **Python 3.10+** (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)


README says Python 3.9 is listed in pyproject.toml, but pyproject.toml currently has requires-python = ">=3.10". This note is now inaccurate; please update/remove it to avoid confusing users about supported Python versions.

Suggested change

- **Python 3.10+** (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)

- **Python 3.10+**

Copilot · 2026-04-18T07:27:40Z

+def compare_estimators_tool(
+    estimator_handles: list[str],
+    dataset: Optional[str] = None,
+    data_handle: Optional[str] = None,
+    metric: str = "MAPE",
+    cv_folds: int = 3,
+    horizon: int = 12,
+) -> dict[str, Any]:


New compare_estimators_tool behavior isn’t covered by tests (there are tests for evaluate_estimator_tool but none for this new tool). Add tests that cover: ranking order on a known dataset, handling of unknown handles, and metric extraction for the default metric (MAPE) to prevent regressions.

himax12 added 3 commits April 18, 2026 12:22

Copilot AI review requested due to automatic review settings April 18, 2026 07:21

Copilot started reviewing on behalf of himax12 April 18, 2026 07:22 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs/uvx recommended setup#237

docs/uvx recommended setup#237
himax12 wants to merge 3 commits intosktime:mainfrom
himax12:docs/uvx-recommended-setup

himax12 commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        initial_window = max(int(n * 0.5), n - cv_folds * 2)
-        if initial_window < 1:
-            initial_window = 1
-        cv = ExpandingWindowSplitter(
-            initial_window=initial_window,
-            step_length=1,
+        if horizon < 1:
+            return {
+                "success": False,
+                "error": "horizon must be at least 1"
+            }
+        if cv_folds < 1:
+            return {
+                "success": False,
+                "error": "cv_folds must be at least 1"
+            }
+        max_fh = horizon
+        max_possible_folds = n - max_fh
+        if max_possible_folds < 1:
+            return {
+                "success": False,
+                "error": (
+                    f"Time series is too short for horizon={horizon}: "
+                    f"need at least {horizon + 1} observations, got {n}"
+                ),
+            }
+        if cv_folds > max_possible_folds:
+            return {
+                "success": False,
+                "error": (
+                    f"Requested cv_folds={cv_folds} is not feasible for "
+                    f"n={n} and horizon={horizon}; maximum is {max_possible_folds}"
+                ),
+            }
+        target_initial_window = max(1, int(n * 0.5))
+        if cv_folds == 1:
+            step_length = 1
+            initial_window = n - max_fh
+        else:
+            step_length = max(
+,
+                (n - max_fh - target_initial_window) // (cv_folds - 1),
+            )
+            initial_window = n - max_fh - step_length * (cv_folds - 1)
+        cv = ExpandingWindowSplitter(
+            initial_window=initial_window,
+            step_length=step_length,

-                "required": ["estimator_handles"],
+                "required": ["estimator_handles"],
+                "oneOf": [
+                    {"required": ["dataset"]},
+                    {"required": ["data_handle"]},
+                ],

		@@ -27,83 +27,90 @@ This MCP is not just documentation or static code analysis. It is a **semant
		## 🛠️ Prerequisites

		- Python 3.10+ (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)

	- Python 3.10+ (3.9 is listed in `pyproject.toml` but the `mcp` package requires 3.10+)
	- Python 3.10+

Conversation

himax12 commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants