Skip to content

sodascience/workshop_geocausal

Repository files navigation

permalink /

Causal inference with geospatial data

Materials for a 4-hour hands-on workshop on causal inference with geospatial data. It is aimed at social scientists who are comfortable with regression but want a clearer way to think about causal claims, the assumptions behind them, and why spatial data make those assumptions harder to satisfy. It focuses on breadth and intuition.

The workshop is two short lectures plus a worked Python practical on a real case study: does livestock density raise ammonia (NH₃) concentrations?

The website for the workshop is here. The github repository is here.

One workshop backbone

Throughout the lectures and the practical we keep returning to the same five questions. This is the whole workshop in one checklist:

  1. What is the treatment?
  2. What is the estimand?
  3. What is the comparison?
  4. What assumption makes that comparison credible?
  5. Why might that assumption fail?

Schedule and materials

Duration Activity Content Link
45 min Lecture 1 Counterfactuals, estimands, exogenous variation, causal designs lecture 1
15 min Break
40 min Lecture 2 Spatial confounding, spillovers, scale, why spatial models are not causal designs lecture 2
60 min Practical Maps and association → confounders and Moran's I → spatial models → DiD with farm gains → spillover-aware interpretation practical notebook

The lectures are reveal.js slides — open the .html files directly in a browser, no setup required. Their source is in the matching .qmd files.

The practical

The practical is a worked example, not a hidden causal proof. Working from the single grid dataset, participants move through:

  1. maps and descriptive association
  2. controls and residual spatial clustering (Moran's I)
  3. spatial lag / error / Durbin models
  4. a difference-in-differences with farm-gain vs no-change cells (2020 → 2024)
  5. why spillovers make that DiD fragile, and how to read a Spatial Durbin model

The takeaway: a map shows where, a regression shows what correlates, and a design tells you what would need to be true for a causal claim.

Both practicals are also available as reactive marimo notebooks (practical/*.py) — see Quick start.

Quick start

You need uv, a fast Python package and environment manager. uv reads pyproject.toml and uv.lock and builds the exact environment automatically the first time you run something — there is no separate "create a venv" step.

1. Install uv

macOS:

curl -LsSf https://astral.sh/uv/install.sh | sh
# or, with Homebrew:  brew install uv

Linux (Ubuntu / Debian):

sudo apt update && sudo apt install -y curl git   # only if they are missing
curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Alternatively, on any platform: pip install uv (or pipx install uv). See the uv install docs for details. Restart your terminal afterwards, then check it works:

uv --version

2. Get the materials

Clone the repository (or download it as a ZIP. and unzip):

git clone https://github.com/sodascience/workshop_geocausal.git
cd workshop_geocausal

3. Open the practical

From the project root, the recommended way is JupyterLab if you have no experience with marimo:

uv run jupyter lab practical/practical_grid_nh3.ipynb

The first run downloads and installs all dependencies (this can take a few minutes); later runs start instantly. JupyterLab opens in your browser — run the cells top to bottom.

Prefer the classic interface? Use uv run jupyter notebook instead of jupyter lab.

Like marimo instead?

uv run marimo edit practical/practical_grid_nh3.py

(marimo edit lets you run and change cells; uv run marimo run practical/practical_grid_nh3.py opens it read-only as an app.)

Data

The practical uses a single, ready-to-use file, data/final/workshop_grid_1km.csv:

  • workshop_grid_1km.csv — the Netherlands on a 1 × 1 km grid, with NH₃ concentrations (2018–2024), livestock and agricultural firm counts per cell, and neighbourhood covariates (population density, urbanity).

It was built from several sources:

  • RIVM NH₃ concentration maps,
  • the CBS Wijk- en Buurtkaart and Kerncijfers
  • Bureau van Dijk Orbis firm data. Orbis is proprietary, so only the aggregated per-cell counts appear in the shared file; the raw inputs and the data-building scripts are not redistributed.

Further reading

A short, opinionated list. Start with the background texts for the ideas; the papers below are the concrete examples used in the lectures.

Causal inference — background

  • Angrist & Pischke, Mostly Harmless Econometrics — the applied-econometrics classic on natural experiments, DiD, IV and matching.
  • Pearl, Glymour & Jewell, Causal Inference in Statistics: A Primer — the graphical (DAG) view of confounding and identification
  • Hernán & Robins, Causal Inference: What If
  • Facure, Causal Inference for the Brave and True

Geospatial statistics & spatial causal inference — background

Papers referenced in the materials (clear geographic causal designs, by identification strategy)

  • Before/after shocks (DiD): Currie & Walker (2011), Traffic Congestion and Infant Health: Evidence from E-ZPass, AEJ: Applied Economics; Schlenker & Walker (2016), Airports, Air Pollution, and Contemporaneous Health, REStud.
  • Boundaries: Black (1999), Do Better Schools Matter?, QJE; Dubé, Lester & Reich (2010), Minimum Wage Effects Across State Borders, REStat.
  • Exposure / movers: Chetty, Hendren & Katz (2016), The Effects of Exposure to Better Neighborhoods on Children, AER; Chetty & Hendren (2018), The Impacts of Neighborhoods on Intergenerational Mobility I, QJE.
  • Instruments: Chay & Greenstone (2003), Air Quality, Infant Mortality, and the Clean Air Act of 1970; Deryugina et al. (2019), The Mortality and Medical Costs of Air Pollution: Evidence from Changes in Wind Direction, AER.
  • Scale / interference: Dinesen & Sønderskov (2015), Ethnic Diversity and Social Trust, ASR; Zabrocki, Alari & Benmarhnia (2022), Improving the design stage of air pollution studies based on wind patterns, Scientific Reports.

Rebuilding the lectures (optional)

The rendered lecture HTML is already included. To rebuild from source you need Quarto and, for the DAG figures, the Python graphviz package plus the Graphviz dot system binary:

quarto render lectures/1_intro_causality/1_intro_causality.qmd --to revealjs
quarto render lectures/2_geocausality/2_geocausality.qmd --to revealjs

Contact

Developed and maintained by the ODISSEI Social Data Science (SoDa) team.

SoDa logo

Questions? Email soda@odissei-data.nl, or contact the instructor Javier Garcia-Bernardo (j.garciabernardo@uu.nl) directly.

About

Workshop prepared for SICSS on causality and geospatial causality

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors