Fix MSR notebook after upgrade to torch 2.8

mdbenito · mdbenito · commit acd36ba265fe · 2025-09-07T16:48:16.000+02:00
diff --git a/notebooks/msr_banzhaf_digits.ipynb b/notebooks/msr_banzhaf_digits.ipynb
@@ -12,7 +12,9 @@
     "\n",
     "Additionally, we compare two sampling techniques: the standard permutation-based Monte Carlo sampling, and the so-called MSR (Maximum Sample Reuse) principle.\n",
     "\n",
-    "In order to highlight the strengths of Data-Banzhaf, we require a stochastic model. For this reason, we use a CNN to classify handwritten digits from the [scikit-learn toy datasets](https://scikit-learn.org/stable/datasets/toy_dataset.html#optical-recognition-of-handwritten-digits-dataset)."
+    "In order to highlight the strengths of Data-Banzhaf, we require a stochastic model. For this reason, we use a CNN to classify handwritten MNIST digits from the [scikit-learn toy datasets](https://scikit-learn.org/stable/datasets/toy_dataset.html#optical-recognition-of-handwritten-digits-dataset).\n",
+    "\n",
+    "To showcase the use of pytorch with valuation methods, the network is a [skorch](https://github.com/skorch-dev/skorch) model."
    ]
   },
   {
@@ -128,7 +130,9 @@
    "source": [
     "## Creating the utility and computing Banzhaf semi-values\n",
     "\n",
-    "Now we can calculate the contribution of each training sample to the model performance. We use a simple CNN written in torch and wrap it into a [skorch.classifier.NeuralNetClassifier][]. Note that any model that implements the protocol [SupervisedModel][pydvl.utils.types.SupervisedModel], which is just the standard sklearn interface of `fit()`,`predict()` and `score()` can be used to construct the utility, so it is possible to construct your own wrapper. Nevertheless, skorch conveniently implements the full sklearn interface, allowing e.g. the use of torch models in pipelines."
+    "Now we can calculate the contribution of each training sample to the model performance. We use a simple CNN written in torch and wrap it into a [skorch.classifier.NeuralNetClassifier][]. Note that any model that implements the protocol [SupervisedModel][pydvl.utils.types.SupervisedModel], which is just the standard sklearn interface of `fit()`,`predict()` and `score()` can be used to construct the utility, so it is possible to construct your own wrapper. Nevertheless, skorch conveniently implements the full sklearn interface, allowing e.g. the use of torch models in pipelines.\n",
+    "\n",
+    "It's important to note the use `torch_load_kwargs={\"weights_only\": False}` in the model definition. This is necessary to ensure that pickling works correctly, which is required for parallel operation. It should also be possible to use `torch.serialization.add_safe_globals()`, following the suggestion in pytorch's documentation."
    ]
   },
   {
@@ -166,6 +170,7 @@
     "    optimizer=torch.optim.Adam,\n",
     "    device=device,\n",
     "    verbose=False,\n",
+    "    torch_load_kwargs={\"weights_only\": False},\n",
     ")\n",
     "model.fit(*train.data());"
    ]
@@ -532,7 +537,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Maximum Sample Reuse Banzhaf\n",
+    "## Maximum Sample Reuse Banzhaf\n",
     "\n",
     "Despite the previous results already being useful, we had to retrain the model a number of times and yet the variance of the value estimates was high. This has consequences for the stability of the top-k ranking of points, which decreases the applicability of the method. We will now use a different sampling method called Maximum Sample Reuse (MSR) which reuses every sample for updating the Banzhaf values. The method was introduced by the authors of Data-Banzhaf and is much more sample-efficient, as we will show."
    ]
@@ -678,7 +683,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Compare convergence speed of Banzhaf and MSR Banzhaf Values\n",
+    "### Convergence speed of Banzhaf and MSR Banzhaf Values\n",
     "\n",
     "Conventional margin-based samplers produce require evaluating the utility twice to do one update of the value, and permutation samplers do instead $n+1$ evaluations for $n$ updates. Maximum Sample Reuse (MSR) updates instead all indices in every sample that the utility evaluates. We compare the convergence rates of these methods.\n",
     "\n",
@@ -857,7 +862,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Similarity of the semi-values computed using different samplers"
+    "## Similarity of the semi-values computed using different samplers"
    ]
   },
   {
diff --git a/requirements-notebooks.txt b/requirements-notebooks.txt
@@ -3,7 +3,7 @@ dask==2024.8.0
 distributed==2024.8.0
 imblearn
 pillow==10.4.0
-skorch>=1.1.0
+skorch>=1.2.0
 torch>=2.8.0
 torchvision>=0.23.0
 transformers==4.44.2