Workshop Data Collection/Annotation & Inferences with LLMs in Social Sciences

This repository contains the code and slides for our workshop on data collection/annotation and inference with Large Language Models. The materials on this page are CC-BY-4.0 licensed.

More information can be found on the website here.

Technical details

No previous experience with LLMs is required.
R or python programming knowledge is desired but not required.
In python we will use langchain, in R we will use ellmer to interact with LLMs.

Preparation (API keys)

You will need an API key for the respective provider you plan to use.

Hugging Face Inference API:
1. Create an account at https://huggingface.co/.
2. Go to https://huggingface.co/settings/tokens and create a new access token.
OpenAI:
1. Create an account at https://platform.openai.com/.
2. Create an API key at https://platform.openai.com/api-keys.
Groq:
1. Create an account at https://console.groq.com/.
2. Create an API key in the Groq console.
SURF AI Hub:
1. It is in pilot phase and requires an application and approval process to get access.
2. Once you have access, you can create an API key at https://willma.surf.nl/.

Save your API keys in a safe place. The notebooks will prompt you to enter the keys at runtime.

Slides

Full workshop slides (v2026.01.23): Download
ODISSEI 2025 workshop slides: Download

Full Workshop Schedule

Time	Title	Resource
09:30	LLM fundamentals for Social Sciences
11:00	Coffee break	Coffee is provided!
11:20	Data collection/annotation with LLMs	`python`, `R`
12:30	Break	Lunch is provided!
13:15	Inference with LLM annotations	`python`, `R`
14:30	Conclusion & Q&A

Methods and software for inference with measurement error correction: sodascience/social_science_inferences_with_llms.

Additional Resources

Tutorial Paper

Read and cite our tutorial paper (preprint):

Fang, Q., Bernardo, J. G., & van Kesteren, E. J. (2026). A Methodological Guide on Using Large Language Models for Text Annotation in the Social Sciences and Humanities with Python and R. arXiv preprint arXiv:2604.09638.
Download from arXiv

Guide to LLM Computing Infrastructure in the Netherlands

Link

[Optional] Run Locally with uv and Python

If you plan to run the Python notebooks locally, we recommend using uv to set up a clean Python environment. You can also use uv to launch Jupyter Lab or Notebook.

Clone the repository:
- git clone https://github.com/sodascience/workshop_llm_data_collection.git
- cd workshop_llm_data_collection
Create and sync the environment:
- uv venv
- uv sync
Start Jupyter:
- uv run jupyter lab (or uv run jupyter notebook)

If you use a different environment manager, make sure the dependencies in pyproject.toml are installed before running the notebooks.

Contact

This project is developed and maintained by the ODISSEI Social Data Science (SoDa) team.

Do you have questions, suggestions, or remarks? File an issue or feel free to contact Qixiang Fang.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
data		data
img		img
notebooks		notebooks
slides		slides
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
llm_social_science_paper.pdf		llm_social_science_paper.pdf
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Workshop Data Collection/Annotation & Inferences with LLMs in Social Sciences

Technical details

Preparation (API keys)

Slides

Full Workshop Schedule

Additional Resources

Tutorial Paper

Guide to LLM Computing Infrastructure in the Netherlands

[Optional] Run Locally with uv and Python

Contact

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Workshop Data Collection/Annotation & Inferences with LLMs in Social Sciences

Technical details

Preparation (API keys)

Slides

Full Workshop Schedule

Additional Resources

Tutorial Paper

Guide to LLM Computing Infrastructure in the Netherlands

[Optional] Run Locally with uv and Python

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages