Search, discover, and download scientific papers from 21 sources — with a single command. Ask questions, find gaps, and compare methods with fully local AI. Or send results to Google NotebookLM for AI-powered summaries, podcasts, and more.
# Search 21 sources at once, deduplicate, download OA PDFs
mosaic search "attention is all you need" --oa-only --download
# Discover related literature from any DOI or arXiv ID — no query needed
mosaic similar 10.48550/arXiv.1706.03762 --sort citations
# Bulk-download your entire Zotero / JabRef library
mosaic get --from refs.bib --oa-only
# Ask your paper library questions with local AI — no data leaves your machine
mosaic index
mosaic ask "What are the main approaches to high-order Maxwell solvers?" --show-sources
mosaic chat # interactive multi-turn session
# Re-rank results by relevance to a query — BM25, instant, no model needed
mosaic search "graph neural networks" --cached --sort relevance
# Turn results into an AI-powered notebook: podcast, slides, quiz, mind map…
mosaic notebook create "Transformers" --query "transformer architecture" --oa-only --podcast
# Structured JSON output — pipe to jq, feed AI agents, use in CI pipelines
mosaic search "diffusion models" --max 50 --oa-only --json | jq '.papers[].doi'
mosaic similar 10.48550/arXiv.1706.03762 --json | jq '.count'
# Install the bundled Claude Code skill — then ask Claude to build your bibliography
mosaic skill install # → /mosaic is available in Claude CodeStefano Zaghi · stefano.zaghi@gmail.com
Chief Yak Shaver & Accidental Package Maintainer — Fortran programmer who needed one paper, opened 21 browser tabs, and six months later found himself maintaining a Python library
Grand Pixel Overlord & Architect of the Sacred Button — world-class web UI designer, responsible for making MOSAIC actually look good
Claude (Anthropic)
Omniscient Code Oracle & Tireless Rubber Duck — AI pair programmer, responsible for writing the boring parts so humans don't have to
Launch with mosaic ui (requires [ui] extra — see Web UI docs).
Index your paper library once and ask questions in natural language. Get structured, cited answers synthesised from your own corpus. Four analysis modes: synthesis (state of the art), gaps (open problems), compare (side-by-side methods), extract (structured per-paper data). Runs entirely on your machine via Ollama or any OpenAI-compatible server — no data leaves your machine unless you configure a cloud provider.
# 1. Index all cached papers (incremental — already-indexed papers are skipped)
mosaic index
# 2. Single-shot analysis
mosaic ask "What FDTD schemes achieve high-order accuracy in time?" --show-sources
mosaic ask "What open problems remain in discontinuous Galerkin methods?" --mode gaps
mosaic ask "Compare DDPM, DDIM, and score SDE approaches" --mode compare --output report.md
# 3. Interactive multi-turn session
mosaic chatRequires sqlite-vec and an embedding model. See the RAG guide for Ollama setup, model selection, and all CLI options.
Re-rank any result set by semantic similarity to the query. The default scorer is BM25 — no model, no network, instant. Configure an LLM for higher-quality scores.
# Re-rank live results from any source
mosaic search "transformer architecture" --sort relevance
# Load a bibliography once, re-rank locally forever — no network needed
mosaic get --from refs.bib
mosaic search "attention mechanism" --cached --sort relevanceSee the Relevance ranking guide.
Turn any search into a Google NotebookLM notebook with a single command. Queue podcast, video overview, slides, quiz, flashcards, FAQ, timeline, briefing doc, and mind map all at once.
mosaic notebook create "Transformers" --query "attention mechanism" --oa-only --podcast --briefingRequires the [notebooklm] extra and a one-time Google sign-in. See the NotebookLM guide.
MOSAIC ships a bundled Claude Code skill. Install it once and the /mosaic slash command gives any Claude Code session expert knowledge of every command, source, filter, export format, and scripting pattern — so you can describe your bibliography goal in plain English and let the AI build and run the right commands for you.
# Install into the current project's .claude/skills/ directory
mosaic skill install
# Or globally for all your projects
mosaic skill install --global
# Inspect the bundled skill content
mosaic skill showAll search and similar commands support --json — a clean {status, query, count, papers[], errors[]} envelope to stdout, designed for piping, scripting, and CI:
# Pipe to jq
mosaic search "FDTD high-order" --max 30 --json | jq -r '.papers[] | "\(.year) \(.doi)"'
# Combine file export with stdout JSON in one run
mosaic search "neural ODEs" --json --output refs.bib
# Full Python agent pipeline
python3 - <<'EOF'
import json, subprocess
def mosaic(args):
r = subprocess.run(["mosaic"] + args, capture_output=True, text=True)
return json.loads(r.stdout)
papers = mosaic(["search", "transformer attention", "--max", "20", "--oa-only", "--json"])["papers"]
top = max(papers, key=lambda p: p["citation_count"] or 0)
related = mosaic(["similar", top["doi"], "--max", "10", "--json"])
print(f"Seed: {top['title']}\nFound {related['count']} related papers")
EOFSee the Agent Workflows guide.
| 🌐 Dozens sources, one command arXiv · Semantic Scholar · OpenAlex · PubMed · PubMed Central · Europe PMC · DOAJ · Crossref · Springer · IEEE · NASA ADS · Zenodo · BASE · CORE · DBLP · HAL · ScienceDirect · bioRxiv/medRxiv · and more — all with mosaic search "query". Sources guide |
🔭 Find similar papersmosaic similar <doi> — discover related literature from any DOI or arXiv ID via OpenAlex graph + Semantic Scholar ML, no query needed. Find similar guide |
✨ Smart deduplication Results merged by DOI: best citation count, richest abstract, earliest PDF URL wins. Usage guide |
| 📥 OA PDF downloads Direct links · Unpaywall fallback · browser-session authenticated access · bulk download from .bib/.csv. Authenticated access guide |
🎛️ Sort & filter Year · author · journal · open-access · citation count — composable, applied at API level where supported. Usage guide |
📤 Export anywhere Markdown · CSV · JSON · BibTeX · Zotero (local & web API). Usage guide |
| 🧠 Local RAG Ask questions, find gaps, compare methods — fully local with sqlite-vec + any Ollama model. mosaic index → mosaic ask → mosaic chat. No data leaves your machine. RAG guide |
📊 Relevance ranking Re-rank results by semantic similarity with --sort relevance. BM25 by default — instant, no model needed. Relevance ranking guide |
🤖 NotebookLM integration Podcast · video · slides · quiz · mind map · flashcards · briefing — queued in one command with mosaic notebook create. NotebookLM guide |
| ⚡ Offline-first cache SQLite — repeated queries are instant, no re-fetching. mosaic search "query" --cached for instant offline search. Usage guide |
🧩 Custom sources Wire any JSON REST API as a new source with a few lines of TOML — no Python needed. Custom sources guide |
🗒️ Obsidian integration Write paper notes directly into an Obsidian vault — YAML frontmatter, >[!abstract] callout, metadata table, and [[wikilinks]] to related papers. Obsidian integration guide |
| 📚 Zotero integration Push results directly into your Zotero library — local API (Zotero running on your machine) or web API ( api.zotero.org). Organise into collections, link downloaded PDFs as attachments, and sync across devices — all with a single --zotero flag. Zotero integration guide |
||
| 🦾 Claude Code Skill & AI agent mode Install the bundled Claude Code skill with mosaic skill install — gives Claude Code expert knowledge of every command. --json on search/similar emits a structured JSON envelope to stdout for piping, scripting, and CI. Agent workflows guide |
||
| Source | Shorthand | Coverage | Auth | OA PDF |
|---|---|---|---|---|
| arXiv | arxiv |
Physics, CS, Math, Biology… | None | Always |
| Semantic Scholar | ss |
214 M papers, all disciplines | Optional key | When indexed |
| ScienceDirect | sd |
Elsevier journals & books | API key or browser session | OA articles |
| Springer Nature | sp |
Springer, Nature & affiliated journals (browser) | None ([browser] extra) |
Via Unpaywall |
| Springer Nature API | springer |
OA articles from Springer, Nature & affiliated journals | Free API key | Direct PDF link |
| DOAJ | doaj |
8 M+ fully open-access articles | None | Always |
| Europe PMC | epmc |
45 M biomedical papers | None | PMC articles |
| OpenAlex | oa |
250 M+ works, all disciplines | None | When available |
| BASE | base |
300 M+ docs from 10 000+ repos | None | When OA + PDF format |
| CORE | core |
200 M+ OA full-text from repos | Free API key | downloadUrl field |
| NASA ADS | ads |
15 M+ astronomy & astrophysics records | Free API token | OA articles |
| IEEE Xplore | ieee |
5 M+ IEEE journals, transactions & conference proceedings | Free API key | OA articles |
| Zenodo | zenodo |
3 M+ OA research outputs (papers, datasets, software) | None (token optional) | Attached PDF files |
| Crossref | crossref |
150 M+ scholarly works (DOI registry) | None (email optional) | When deposited by publisher |
| DBLP | dblp |
6 M+ CS publications (journals, conferences) | None | Via ee field (arXiv/OA links) |
| HAL | hal |
1.5 M+ OA documents, strong for French academic output | None | Direct PDF when deposited |
| PubMed | pubmed |
35 M+ biomedical citations (NCBI) | None (API key optional) | PMC PDF for OA articles |
| PubMed Central | pmc |
5 M+ free full-text biomedical articles | None (API key optional) | Always — all PMC articles are OA |
| bioRxiv / medRxiv | rxiv |
Life-science and medical preprints | None | Always (all preprints are OA) |
| PEDro | pedro |
Physiotherapy evidence database | None (fair-use ack) | No (abstracts only) |
| Scopus | scopus |
90 M+ abstracts from Elsevier's citation database | API key or browser session | Via Unpaywall |
| Unpaywall | — | PDF resolver for any DOI | Email only | Legal OA copy |
The easiest way to get started on Windows, macOS, or Linux: download the pre-built standalone app from the GitHub Releases page. Unzip and run — no Python, no pip, no setup. See the Web UI guide for a step-by-step walkthrough.
Requires Python 3.11+.
pipx install mosaic-search # recommended — isolated, globally available
uv tool install mosaic-search # fastest alternative
pip install mosaic-search # inside a virtualenvThe core install covers all 21 search sources and the full CLI. Extra dependencies are only needed for specific opt-in features:
| Feature | Extra | Install |
|---|---|---|
Web UI (mosaic ui) |
[ui] |
pipx inject mosaic-search "flask>=3.0" "waitress>=3.0" |
Local RAG (mosaic index/ask/chat) |
[rag] |
pipx inject mosaic-search sqlite-vec |
Browser sessions (mosaic auth login) |
[browser] |
pipx inject mosaic-search "playwright>=1.40" + playwright install chromium |
NotebookLM (mosaic notebook) |
[notebooklm] |
pipx inject --include-apps mosaic-search "notebooklm-py[browser]" + playwright install chromium |
For uv: replace pipx inject with uv tool inject. For pip/venv: pip install 'mosaic-search[extra]'.
Full setup instructions for each feature → Installation guide.
# 1. Set your email (enables Unpaywall PDF fallback)
mosaic config --unpaywall-email you@example.com
# 2. Optional: add an Elsevier API key to unlock ScienceDirect
mosaic config --elsevier-key YOUR_KEY
# 3. Search and download
mosaic search "transformer architecture" --oa-only --download# Search all enabled sources (10 results per source by default)
mosaic search "protein folding"
# More results, open-access only
mosaic search "deep learning" -n 25 --oa-only
# Single source
mosaic search "RNA velocity" --source epmc
# Search only the local cache — instant, no network
mosaic search "attention mechanism" --cached
# Re-rank a local bibliography by relevance — no network, no API keys
mosaic get --from refs.bib # load .bib into cache once
mosaic search "transformer attention" --cached --sort relevanceSource shorthands: arxiv · ss · sd · doaj · epmc · oa · base · core · sp · springer · ads · ieee · zenodo · crossref · dblp · hal · pubmed · pmc · rxiv · pedro · scopus
Custom sources defined in config.toml are also queried and addressable by their name.
# By year — single, range, or list
mosaic search "BERT" --year 2019
mosaic search "diffusion models" -y 2020-2023
mosaic search "GPT" -y 2020,2022,2024
# By author (repeatable, OR logic, case-insensitive substring)
mosaic search "attention" -a Vaswani -a Shazeer
# By journal (case-insensitive substring)
mosaic search "CRISPR" --journal "Nature"
# Combine freely
mosaic search "graph neural" -y 2021-2023 -a Kipf -j "ICLR" --oa-only --download# Discover related literature from any DOI or arXiv ID
mosaic similar 10.48550/arXiv.1706.03762
# Sort by citation count, open-access only
mosaic similar arxiv:1706.03762 -n 20 --sort citations --oa-only
# Save to BibTeX
mosaic similar 10.1038/s41586-021-03819-2 --output related.bibUses OpenAlex related_works (always) and Semantic Scholar recommendations (when ss-key is configured). Results are deduplicated and merged — the higher citation count and richer metadata always win.
mosaic get 10.48550/arXiv.1706.03762Checks the local cache first — if the paper was seen in a previous search and a PDF URL is already known, downloads immediately without hitting Unpaywall.
# Export your Zotero/JabRef/Mendeley library and download everything
mosaic get --from refs.bib
# CSV with a 'doi' column works too
mosaic get --from references.csv --oa-onlyExtracts all DOIs from the file, deduplicates, and downloads with the same fallback chain (direct PDF → Unpaywall → browser session).
# Push to your Zotero library (Zotero must be running — no config needed)
mosaic search "CRISPR" --oa-only --zotero
# Push to a named collection (created automatically if missing)
mosaic search "transformers" --zotero --zotero-collection "Deep Learning"
# Download PDFs and link them as Zotero attachments (local mode)
mosaic search "diffusion models" --download --zotero --zotero-collection "Generative AI"For the web API (Zotero not running locally), configure once:
mosaic config --zotero-key YOUR_API_KEYmosaic config --show # print current config
mosaic config --unpaywall-email me@uni.edu
mosaic config --elsevier-key abc123
mosaic config --ss-key xyz789
mosaic config --download-dir ~/papersConfig is stored at ~/.config/mosaic/config.toml. Downloaded PDFs go to ~/mosaic-papers/ by default.
Any number of JSON REST APIs can be added as new sources directly in config.toml — one [[custom_sources]] block per source, no Python required:
[[custom_sources]]
name = "My Institution Repo"
enabled = true
url = "https://repo.myuni.edu/api/search"
method = "GET"
query_param = "q"
results_path = "results"
[custom_sources.fields]
title = "title"
doi = "doi"
year = "year"
authors = "authors" # flat string array
journal = "source.title"See the Custom Sources guide for the full reference.
Send search results directly to a Google NotebookLM notebook:
# 1. Inject into MOSAIC (--include-apps exposes the notebooklm CLI)
pipx inject --include-apps mosaic-search "notebooklm-py[browser]"
# 2. Install Chromium — playwright lives inside the pipx venv, call it directly
~/.local/share/pipx/venvs/mosaic-search/bin/playwright install chromium
# 3. Authenticate once
notebooklm login
# 4. Search, download, and create a notebook in one command
mosaic notebook create "Transformers" --query "transformer architecture" --oa-only --podcast
# Or import PDFs you already have
mosaic notebook create "My Papers" --from-dir ~/mosaic-papers/MOSAIC uploads local PDFs when available, falls back to URLs otherwise, and respects NotebookLM's 50-source limit. With --podcast, an Audio Overview is queued automatically.
flowchart LR
CLI -->|query + filters| Search
Search --> arXiv & SS[Semantic Scholar] & SD[ScienceDirect] & SP[Springer browser] & SPN[Springer API] & DOAJ & EPMC[Europe PMC] & OA[OpenAlex] & BASE & CORE & ADS[NASA ADS] & IEEE[IEEE Xplore] & ZEN[Zenodo] & CR[Crossref] & DBLP[DBLP] & HAL[HAL] & PM[PubMed] & PMC[PubMed Central] & RXV[bioRxiv/medRxiv]
arXiv & SS & SD & SP & SPN & DOAJ & EPMC & OA & BASE & CORE & ADS & IEEE & ZEN & CR & DBLP & HAL & PM & PMC & RXV -->|Paper list| Dedup{Deduplicate\nby DOI}
Dedup --> Cache[(SQLite\ncache)]
Dedup --> Table[Rich table]
Table -->|--download| DL[Downloader]
DL -->|no pdf_url| UPW[Unpaywall]
UPW --> DL
DL -->|no OA copy| AUTH[Browser session]
AUTH --> DL
DL --> Disk[(~/mosaic-papers/)]
DL -->|mosaic notebook create| NLM[NotebookLM]
| 📢 Announcements | t.me/mosaic_search — releases, CI events, weekly digest |
| 💬 Support group | t.me/mosaic_search_support — questions, bug reports, discussion |
The support group has a bot that responds instantly to /help, /install, /version, /docs, /sources, /changelog, /bug, and /roadmap, and auto-replies to common questions.
pip install -e ".[dev]"
# with NotebookLM integration (includes Playwright for auth)
pip install -e ".[dev,notebooklm]"
playwright install chromium
# run tests + coverage
pytest
# live docs
cd docs && npm install && npm run docs:devCoverage report and badge JSON are written to docs/public/ after every test run.
If you run MOSAIC via pipx install mosaic-search and want to test local changes
without affecting your stable install, use make dev:
make dev # reinstalls from source into the pipx venv (no-deps, instant)New dependency added to pyproject.toml? make dev skips dependency installation.
Inject the new package into the pipx venv once:
pipx inject mosaic-search <new-package>MOSAIC is available under your choice of license:
| License | SPDX | File |
|---|---|---|
| GNU General Public License v3 | GPL-3.0-or-later |
LICENSE.gpl3.md |
| BSD 2-Clause | BSD-2-Clause |
LICENSE.bsd-2.md |
| BSD 3-Clause | BSD-3-Clause |
LICENSE.bsd-3.md |
| MIT | MIT |
LICENSE.mit.md |



