EDUCATIONAL USE ONLY: This tool is designed for educational purposes, security research, and legitimate OSINT investigations. Users must comply with all applicable laws and ethical guidelines. Misuse for stalking, harassment, or illegal activities is strictly prohibited.
A multi-agent OSINT tool built with LangGraph that searches across Google, social media platforms, and enrichment APIs in parallel, then chains the results through an analysis and report-generation stage powered by Claude (Anthropic).
- Parallel LangGraph workflow: Google search and social media search run simultaneously; results feed a single analysis → report pipeline
- Platform coverage: Google (Tavily → SerpAPI → free fallback), GitHub, Reddit, Twitter/X (with dork fallback), YouTube, LinkedIn, Instagram (dork fallback), Facebook (dork fallback), SoundCloud (dork fallback)
- API enrichment: HIBP breach detection, Hunter.io email discovery
- Resilient execution: Automatic retry with exponential backoff, disk-based caching to avoid rate limits
- Input validation: Sanitization and validation of all user inputs
- Streamlit web UI: interactive interface with dual tabs (Investigate / Reports) for real-time log streaming and viewing saved reports inline
- CLI mode: scriptable via
python main.py - Advanced analysis (optional): timeline correlation, network analysis, deep content analysis
- Docker-first: a single image runs the app, unit tests, and UI tests
- Python 3.11+ (or Docker)
- Required: Anthropic API key
- Optional: additional API keys for richer results (see Environment)
# Copy and fill in your API keys
cp .env.example .env # then edit .env and set ANTHROPIC_API_KEY
# Build and start the web UI
docker compose up --buildOpen http://localhost:8501 in your browser.
CLI via Docker:
docker compose run --rm osint-tool python main.py "John Doe"# Install dependencies
pip install -r requirements.txt
# Copy and configure environment
cp .env.example .env # then edit .env
# Start the web UI
streamlit run app.py
# Or use the CLI
python main.py "John Doe"Copy .env.example to .env and configure:
# Required
ANTHROPIC_API_KEY=sk-ant-...
LLM_MODEL=claude-sonnet-4-6 # default; change to use a different Claude model
# Optional — the tool works without these, but results improve significantly
TAVILY_API_KEY=... # Best Google results (free tier available)
SERPAPI_KEY=... # Google results fallback ($50/mo, 5000 searches)
GITHUB_TOKEN=... # Higher rate limits (free personal access token)
TWITTER_BEARER_TOKEN=... # Twitter timeline access (free tier available)
YOUTUBE_API_KEY=... # YouTube channel data (free, 10 000 units/day)
HUNTER_API_KEY=... # Email discovery (free tier: 25 searches/month)
HIBP_API_KEY=... # Breach detection ($3.50/month)
# Logging
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERRORGetting API keys:
| Key | Where to get it | Cost |
|---|---|---|
ANTHROPIC_API_KEY |
console.anthropic.com | Pay-per-use |
GITHUB_TOKEN |
GitHub → Settings → Developer settings → Personal access tokens | Free |
YOUTUBE_API_KEY |
console.developers.google.com → YouTube Data API v3 | Free (quota) |
TWITTER_BEARER_TOKEN |
developer.twitter.com | Free tier |
HUNTER_API_KEY |
hunter.io/api | Free tier (25/mo) |
HIBP_API_KEY |
haveibeenpwned.com/API/Key | $3.50/mo |
TAVILY_API_KEY |
tavily.com | Free tier available |
SERPAPI_KEY |
serpapi.com | $50/mo |
Quick start recommendation: Enable
TAVILY_API_KEY,GITHUB_TOKEN, andYOUTUBE_API_KEYfirst — all have free tiers and require no approval.
DigitalFootprintInvestigator/
├── graph/
│ ├── nodes/
│ │ ├── _timing.py # Shared log_start / log_done helpers
│ │ ├── search.py # Google and social search nodes (run in parallel)
│ │ ├── analysis.py # Data correlation and pattern extraction
│ │ ├── advanced.py # Optional timeline / network / content analysis
│ │ └── report.py # Claude-powered report generation
│ ├── state.py # LangGraph state TypedDict
│ └── workflow.py # Graph construction and MemorySaver checkpointing
├── tools/
│ ├── search_tools.py # Google search and platform scrapers
│ └── api_tools.py # HIBP, Hunter.io, YouTube, Twitter wrappers
├── utils/
│ ├── llm.py # Shared ChatAnthropic factory
│ ├── logger.py # Logging setup
│ ├── cache.py # Disk-based caching for API calls
│ ├── retry.py # Exponential backoff retry logic
│ ├── validation.py # Input validation and sanitization
│ └── models.py # Pydantic data models
├── tests/
│ ├── conftest.py # Playwright session/page fixtures
│ ├── healer.py # Self-healing Playwright page wrapper
│ ├── unit/ # 215 unit tests (no browser or live API required)
│ └── ui/ # 25 Playwright browser tests
├── app.py # Streamlit web UI
├── main.py # CLI entry point
├── config.yaml # Platform and analysis settings
├── pytest.ini # Test markers and paths
├── .env.example # Environment variable template
├── requirements.txt # Python dependencies
├── Dockerfile # Single image used by all three services
├── docker-compose.yml # Services: osint-tool, unit-tests, tests
├── pyproject.toml # Bandit security scan config
├── .pre-commit-config.yaml # Pre-commit hooks
└── .dockerignore # Docker build exclusions
# Unit tests (no browser, no API key needed)
python -m pytest tests/unit/ -v
# UI tests (starts Streamlit automatically; requires a Playwright browser)
playwright install chromium # first time only
python -m pytest tests/ui/ -m "not integration" -v
# Integration tests (require ANTHROPIC_API_KEY and a full workflow run)
python -m pytest tests/ui/ -m integration -v# Unit tests — no running app needed
docker compose run --rm unit-tests
# UI tests — automatically starts the app and waits for it to be healthy
docker compose run --rm testsIntegration tests are excluded by default in Docker (
-m "not integration"). To run them:docker compose run --rm tests pytest tests/ui/ -v
config.yaml controls platforms and advanced analysis defaults. The Streamlit sidebar and CLI flags override these values per run.
Advanced analysis (each option adds an extra LLM pass; all off by default):
advanced_analysis:
timeline_correlation: false # Build a chronological activity timeline
network_analysis: false # Map relationships between accounts
deep_content_analysis: false # Sentiment, topics, behavioral patternsCLI flags:
python main.py "Jane Smith" --timeline --network --deep# Name
python main.py "Jane Smith"
# Email
python main.py "jane.smith@example.com"
# Username
python main.py "@janesmith"
# With advanced analysis
python main.py "Jane Smith" --timeline --network --deepReports are saved to reports/ with a timestamp, e.g. reports/Jane_Smith_20260227_143022.md.
Pre-commit hooks run automatically on git commit:
pip install pre-commit
pre-commit installHooks: detect-secrets, ruff (lint + format), merge-conflict detection, large-file guard, YAML/JSON/TOML validation, debug-statement blocking, and bandit security scanning. Bandit config lives in pyproject.toml.
To run all hooks manually: pre-commit run --all-files
- Add a search function to
tools/search_tools.pyfollowing the_search_github/_search_redditpattern. - Register it in the
_search_platformdispatch table in the same file. - Optionally add platform config to
config.yaml.
- Create
graph/nodes/my_node.py:
from graph.state import OSINTState
from graph.nodes._timing import log_start, log_done
def my_node(state: OSINTState) -> dict:
start = log_start("My Node")
# ... process state ...
log_done("My Node", start)
return {"my_key": result}- Register in
graph/workflow.py:
from .nodes.my_node import my_node
workflow.add_node("my_node", my_node)
workflow.add_edge("analysis", "my_node")"No Anthropic API key found" — create .env from .env.example and set ANTHROPIC_API_KEY.
Google searches return no results — the free googlesearch-python library is rate-limited and unreliable. Add TAVILY_API_KEY or SERPAPI_KEY to .env for consistent results (Tavily is tried first, then SerpAPI, then the free fallback).
Page refresh hangs in Docker on Windows — Streamlit’s file watcher conflicts with Docker volume mounts. fileWatcherType = "none" is set in .streamlit/config.toml; restore it if it gets removed.
Module not found — run pip install -r requirements.txt and confirm Python 3.11+.
Console warning: missing ScriptRunContext — harmless startup warning, handled in app.py.
Console warning: file_cache is only supported with oauth2client<4.0.0 — harmless warning from the Google API client library.
Docker Desktop won’t start (WSL error) — the Ubuntu WSL distro may have auto-shut down. Run wsl -d Ubuntu in a terminal first, wait a few seconds, then retry Docker Desktop.
EDUCATIONAL AND LEGITIMATE PURPOSES ONLY
Appropriate uses: security research, due diligence, investigative journalism, personal privacy audits, OSINT methodology research, identity verification with consent.
Prohibited uses: stalking or harassment, unauthorized surveillance, identity theft, doxxing, any illegal activity.
All searches use publicly available information only. Users are responsible for compliance with:
- Local privacy laws (GDPR, CCPA, etc.)
- Platform Terms of Service (including Twitter/X, Reddit, YouTube, and others)
- The Computer Fraud and Abuse Act (CFAA) and equivalent local laws
The consent checkbox in the UI is not a legal shield — you remain fully responsible for how you use this tool.
By using this tool, you agree to use it responsibly and ethically.
The tool includes built-in optimizations:
- Caching: API responses cached for 1-24 hours to reduce quota usage and improve speed. Repeated searches are significantly faster.
- Retry logic: Failed API calls automatically retry up to 3 times with exponential backoff for resilience against network issues.
- Input validation: All inputs sanitized and validated before processing to prevent errors and improve security.
Cache files are stored in .cache/ and automatically expire based on TTL. To clear cache: rm -rf .cache/ (Unix) or rmdir /s .cache (Windows).
This project is for educational purposes. Use responsibly and in accordance with applicable laws.