This repository contains the code and slides for our workshop on data collection/annotation and inference with Large Language Models. The materials on this page are CC-BY-4.0 licensed.
More information can be found on the website here.
- No previous experience with LLMs is required.
Rorpythonprogramming knowledge is desired but not required.- In python we will use
langchain, in R we will useellmerto interact with LLMs.
You will need an API key for the respective provider you plan to use.
-
Hugging Face Inference API:
- Create an account at https://huggingface.co/.
- Go to https://huggingface.co/settings/tokens and create a new access token.
-
OpenAI:
- Create an account at https://platform.openai.com/.
- Create an API key at https://platform.openai.com/api-keys.
-
Groq:
- Create an account at https://console.groq.com/.
- Create an API key in the Groq console.
-
SURF AI Hub:
- It is in pilot phase and requires an application and approval process to get access.
- Once you have access, you can create an API key at https://willma.surf.nl/.
Save your API keys in a safe place. The notebooks will prompt you to enter the keys at runtime.
| Time | Title | Resource |
|---|---|---|
| 09:30 | LLM fundamentals for Social Sciences | |
| 11:00 | Coffee break | Coffee is provided! |
| 11:20 | Data collection/annotation with LLMs | python, R |
| 12:30 | Break | Lunch is provided! |
| 13:15 | Inference with LLM annotations | python, R |
| 14:30 | Conclusion & Q&A |
Methods and software for inference with measurement error correction: sodascience/social_science_inferences_with_llms.
Read and cite our tutorial paper (preprint):
- Fang, Q., Bernardo, J. G., & van Kesteren, E. J. (2026). A Methodological Guide on Using Large Language Models for Text Annotation in the Social Sciences and Humanities with Python and R. arXiv preprint arXiv:2604.09638.
Downloadfrom arXiv
If you plan to run the Python notebooks locally, we recommend using uv to set up a clean Python environment. You can also use uv to launch Jupyter Lab or Notebook.
- Clone the repository:
git clone https://github.com/sodascience/workshop_llm_data_collection.gitcd workshop_llm_data_collection
- Create and sync the environment:
uv venvuv sync
- Start Jupyter:
uv run jupyter lab(oruv run jupyter notebook)
If you use a different environment manager, make sure the dependencies in pyproject.toml are installed before running the notebooks.
This project is developed and maintained by the ODISSEI Social Data Science (SoDa) team.
Do you have questions, suggestions, or remarks? File an issue or feel free to contact Qixiang Fang.

