| permalink | / |
|---|
Materials for a 4-hour hands-on workshop on causal inference with geospatial data. It is aimed at social scientists who are comfortable with regression but want a clearer way to think about causal claims, the assumptions behind them, and why spatial data make those assumptions harder to satisfy. It focuses on breadth and intuition.
The workshop is two short lectures plus a worked Python practical on a real case study: does livestock density raise ammonia (NH₃) concentrations?
The website for the workshop is here. The github repository is here.
Throughout the lectures and the practical we keep returning to the same five questions. This is the whole workshop in one checklist:
- What is the treatment?
- What is the estimand?
- What is the comparison?
- What assumption makes that comparison credible?
- Why might that assumption fail?
| Duration | Activity | Content | Link |
|---|---|---|---|
| 45 min | Lecture 1 | Counterfactuals, estimands, exogenous variation, causal designs | lecture 1 |
| 15 min | Break | ||
| 40 min | Lecture 2 | Spatial confounding, spillovers, scale, why spatial models are not causal designs | lecture 2 |
| 60 min | Practical | Maps and association → confounders and Moran's I → spatial models → DiD with farm gains → spillover-aware interpretation | practical notebook |
The lectures are reveal.js slides — open the .html files directly in a
browser, no setup required. Their source is in the matching .qmd files.
The practical is a worked example, not a hidden causal proof. Working from the single grid dataset, participants move through:
- maps and descriptive association
- controls and residual spatial clustering (Moran's I)
- spatial lag / error / Durbin models
- a difference-in-differences with farm-gain vs no-change cells (2020 → 2024)
- why spillovers make that DiD fragile, and how to read a Spatial Durbin model
The takeaway: a map shows where, a regression shows what correlates, and a design tells you what would need to be true for a causal claim.
practical/practical_grid_nh3.ipynbis the main notebook used in the practical session — read it rendered online.practical/practical_grid_nh3_butts.ipynbis an optional, more advanced notebook on design-based spillover DiD (far controls and distance rings) — read it rendered online.
Both practicals are also available as reactive marimo notebooks
(practical/*.py) — see Quick start.
You need uv, a fast Python package and
environment manager. uv reads pyproject.toml and uv.lock and builds the
exact environment automatically the first time you run something — there is no
separate "create a venv" step.
macOS:
curl -LsSf https://astral.sh/uv/install.sh | sh
# or, with Homebrew: brew install uvLinux (Ubuntu / Debian):
sudo apt update && sudo apt install -y curl git # only if they are missing
curl -LsSf https://astral.sh/uv/install.sh | shWindows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Alternatively, on any platform: pip install uv (or pipx install uv). See the
uv install docs for
details. Restart your terminal afterwards, then check it works:
uv --versionClone the repository (or download it as a ZIP. and unzip):
git clone https://github.com/sodascience/workshop_geocausal.git
cd workshop_geocausalFrom the project root, the recommended way is JupyterLab if you have no experience with marimo:
uv run jupyter lab practical/practical_grid_nh3.ipynbThe first run downloads and installs all dependencies (this can take a few minutes); later runs start instantly. JupyterLab opens in your browser — run the cells top to bottom.
Prefer the classic interface? Use
uv run jupyter notebookinstead ofjupyter lab.
Like marimo instead?
uv run marimo edit practical/practical_grid_nh3.py(marimo edit lets you run and change cells; uv run marimo run practical/practical_grid_nh3.py opens it read-only as an app.)
The practical uses a single, ready-to-use file,
data/final/workshop_grid_1km.csv:
workshop_grid_1km.csv— the Netherlands on a 1 × 1 km grid, with NH₃ concentrations (2018–2024), livestock and agricultural firm counts per cell, and neighbourhood covariates (population density, urbanity).
It was built from several sources:
- RIVM NH₃ concentration maps,
- the CBS Wijk- en Buurtkaart and Kerncijfers
- Bureau van Dijk Orbis firm data. Orbis is proprietary, so only the aggregated per-cell counts appear in the shared file; the raw inputs and the data-building scripts are not redistributed.
A short, opinionated list. Start with the background texts for the ideas; the papers below are the concrete examples used in the lectures.
Causal inference — background
- Angrist & Pischke, Mostly Harmless Econometrics — the applied-econometrics classic on natural experiments, DiD, IV and matching.
- Pearl, Glymour & Jewell, Causal Inference in Statistics: A Primer — the graphical (DAG) view of confounding and identification
- Hernán & Robins, Causal Inference: What If
- Facure, Causal Inference for the Brave and True
Geospatial statistics & spatial causal inference — background
- Rey, Arribas-Bel & Wolf, Geographic Data Science with Python — free online; spatial weights, Moran's I, and spatial regression in Python.
Papers referenced in the materials (clear geographic causal designs, by identification strategy)
- Before/after shocks (DiD): Currie & Walker (2011), Traffic Congestion and Infant Health: Evidence from E-ZPass, AEJ: Applied Economics; Schlenker & Walker (2016), Airports, Air Pollution, and Contemporaneous Health, REStud.
- Boundaries: Black (1999), Do Better Schools Matter?, QJE; Dubé, Lester & Reich (2010), Minimum Wage Effects Across State Borders, REStat.
- Exposure / movers: Chetty, Hendren & Katz (2016), The Effects of Exposure to Better Neighborhoods on Children, AER; Chetty & Hendren (2018), The Impacts of Neighborhoods on Intergenerational Mobility I, QJE.
- Instruments: Chay & Greenstone (2003), Air Quality, Infant Mortality, and the Clean Air Act of 1970; Deryugina et al. (2019), The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction, AER.
- Scale / interference: Dinesen & Sønderskov (2015), Ethnic Diversity and Social Trust, ASR; Zabrocki, Alari & Benmarhnia (2022), Improving the design stage of air pollution studies based on wind patterns, Scientific Reports.
The rendered lecture HTML is already included. To rebuild from source you need
Quarto and, for the DAG figures, the Python graphviz
package plus the Graphviz dot system binary:
quarto render lectures/1_intro_causality/1_intro_causality.qmd --to revealjs
quarto render lectures/2_geocausality/2_geocausality.qmd --to revealjsDeveloped and maintained by the ODISSEI Social Data Science (SoDa) team.
Questions? Email soda@odissei-data.nl, or contact the instructor Javier Garcia-Bernardo (j.garciabernardo@uu.nl) directly.
