langchain-bigquery is a LangChain integration package for Google Cloud BigQuery. It enables you to build retrieval-augmented generation (RAG) and graph-based AI applications directly on top of BigQuery, leveraging its serverless scale, native vector search, full-text search, and property graph capabilities — without managing a separate vector database or graph database.
Important
What this package adds beyond langchain-google-community
The official langchain-google-community[featurestore] package ships only the basic BigQueryVectorStore (vector search via BigQuery Feature Store). It does not provide property-graph storage, Graph RAG retrievers, or hybrid (vector + full-text) search.
langchain-bigquery fills that gap by adding:
- Property Graph support —
BigQueryGraphStoreand Graph RAG retrievers (BigQueryGraphVectorContextRetriever,BigQueryGraphTextToGQLRetriever) built on BigQuery's native property graph and GQL features. - Hybrid Search —
BigQueryHybridSearchVectorStore, which combinesVECTOR_SEARCH()andSEARCH()(with Reciprocal Rank Fusion) in a single retrieval step.
Use this package when you need graph-aware retrieval or hybrid keyword + semantic search on BigQuery — capabilities not available in langchain-google-community.
BigQueryGraphStore— Store and query property graphs in BigQuery using the GQL (Graph Query Language) standard. Automatically manages node/edge tables and the underlying property graph schema.BigQueryGraphVectorContextRetriever— Perform vector similarity search over graph nodes with optional multi-hop neighborhood expansion, returning rich graph context for RAG.BigQueryGraphTextToGQLRetriever— Translate natural language questions into GQL queries with an LLM, with optional few-shot examples for improved accuracy.BigQueryHybridSearchVectorStore— Combine BigQuery'sVECTOR_SEARCH()(semantic similarity) andSEARCH()(full-text keyword matching) into a single hybrid retrieval step, with both pre-filter and Reciprocal Rank Fusion (RRF) modes.
- You already store your data in BigQuery and want to add semantic, hybrid, or graph-based retrieval without exporting it.
- You need a serverless, fully managed backend for vector and graph workloads at BigQuery scale.
- You are building Agentic RAG or Graph RAG applications (e.g., with the Agent Development Kit (ADK)) and want a single source of truth in BigQuery.
In order to use this library, you first need to go through the following steps:
- Select or create a Cloud Platform project.
- Enable billing for your project.
- Enable the Google Cloud BigQuery API.
- Setup Authentication.
Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.
With virtualenv, it's possible to install this library without needing system install permissions, and without clashing with the installed system dependencies.
Python >= 3.10
pip install virtualenv
virtualenv <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install langchain-bigquerypip install virtualenv
virtualenv <your-env>
<your-env>\Scripts\activate
<your-env>\Scripts\pip.exe install langchain-bigquery- A Google Cloud project with billing enabled
- The following APIs enabled:
- BigQuery API
- Vertex AI API (for embedding models)
gcloud services enable bigquery.googleapis.com
gcloud services enable aiplatform.googleapis.comgcloud auth application-default loginTo use Vertex AI with Application Default Credentials (no API key required):
export GOOGLE_GENAI_USE_VERTEXAI=trueThe authenticated account needs the following roles (or equivalent permissions):
| Role | Purpose |
|---|---|
roles/bigquery.dataEditor |
Create/delete tables, insert data |
roles/bigquery.jobUser |
Run queries (VECTOR_SEARCH, SEARCH) |
roles/bigquery.dataViewer |
Read table data and metadata |
roles/aiplatform.user |
Access Vertex AI embedding models |
PROJECT_ID=your-gcp-project-id
ACCOUNT=$(gcloud config get account)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="user:$ACCOUNT" \
--role="roles/bigquery.dataEditor"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="user:$ACCOUNT" \
--role="roles/bigquery.jobUser"A BigQuery dataset must be created before using BigQueryGraphStore or BigQueryHybridSearchVectorStore. Tables and property graphs are created automatically, but the dataset is not.
bq mk --dataset --location=us-central1 YOUR_PROJECT_ID:YOUR_DATASETAlternatively, grant bigquery.datasets.create permission to let the library create the dataset automatically.
Use BigQueryGraphStore for storing and querying property graphs in BigQuery.
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_bigquery import BigQueryGraphStore
store = BigQueryGraphStore(
project_id="my-project",
dataset_id="my_dataset",
graph_name="knowledge_graph",
location="us-central1",
)
# Define nodes and relationships
alice = Node(id="alice", type="Person", properties={"name": "Alice", "age": 30})
bob = Node(id="bob", type="Person", properties={"name": "Bob", "age": 25})
acme = Node(id="acme", type="Company", properties={"name": "Acme Corp"})
works_at = Relationship(source=alice, target=acme, type="WORKS_AT")
knows = Relationship(source=alice, target=bob, type="KNOWS")
doc = GraphDocument(
nodes=[alice, bob, acme],
relationships=[works_at, knows],
source=Document(page_content="Alice works at Acme Corp and knows Bob."),
)
# This creates tables, the property graph, and inserts data
store.add_graph_documents([doc])
# Query with GQL
results = store.query(
"GRAPH `my_dataset`.`knowledge_graph` MATCH (p:Person) RETURN p.name AS name"
)
print(results)
# [{'name': 'Alice'}, {'name': 'Bob'}]Use BigQueryGraphVectorContextRetriever to perform vector similarity search on graph nodes with optional multi-hop neighborhood expansion.
from langchain_google_vertexai import VertexAIEmbeddings
from langchain_bigquery import BigQueryGraphVectorContextRetriever
embeddings = VertexAIEmbeddings(model="gemini-embedding-001")
# Return specific properties from matching nodes
retriever = BigQueryGraphVectorContextRetriever.from_params(
graph_store=store,
embedding_service=embeddings,
label_expr="Person",
embeddings_column="embedding",
return_properties_list=["name", "age"],
top_k=5,
)
docs = retriever.invoke("Who works at Acme?")With multi-hop expansion:
# Expand results by traversing 2 hops from matching nodes
retriever = BigQueryGraphVectorContextRetriever.from_params(
graph_store=store,
embedding_service=embeddings,
label_expr="Person",
embeddings_column="embedding",
expand_by_hops=2,
top_k=5,
)
docs = retriever.invoke("Tell me about Alice")Use BigQueryGraphTextToGQLRetriever to translate natural language questions into GQL queries using an LLM.
from langchain_google_vertexai import ChatVertexAI
from langchain_bigquery import BigQueryGraphTextToGQLRetriever
llm = ChatVertexAI(model="gemini-2.5-flash")
retriever = BigQueryGraphTextToGQLRetriever.from_params(
llm=llm,
graph_store=store,
k=10,
)
docs = retriever.invoke("Find all people who work at Acme Corp")With few-shot examples for better GQL generation:
from langchain_google_vertexai import VertexAIEmbeddings
retriever = BigQueryGraphTextToGQLRetriever.from_params(
llm=llm,
embedding_service=VertexAIEmbeddings(model="gemini-embedding-001"),
graph_store=store,
)
retriever.add_example(
question="Who works at Acme?",
gql="GRAPH `my_dataset`.`knowledge_graph` MATCH (p:Person)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'}) RETURN p.name AS name",
)
docs = retriever.invoke("Which people are employed by Acme Corp?")See the full Graph RAG tutorial.
Use BigQueryHybridSearchVectorStore for hybrid (vector + full-text) search. Combines BigQuery VECTOR_SEARCH() (semantic similarity) with SEARCH() (full-text keyword matching) into a single retrieval step.
from langchain_bigquery import BigQueryHybridSearchVectorStore
from langchain_google_vertexai import VertexAIEmbeddings
store = BigQueryHybridSearchVectorStore(
project_id="my-project",
dataset_name="my_dataset",
table_name="documents",
location="US",
embedding=VertexAIEmbeddings(model="gemini-embedding-001"),
distance_type="COSINE",
search_analyzer="LOG_ANALYZER",
)
# Pre-filter mode (default): keyword filter -> vector ranking
results = store.hybrid_search(
query="How to optimize BigQuery performance?",
text_query="BigQuery optimization",
k=10,
)
# RRF mode: independent keyword + vector search -> merged ranking
results = store.hybrid_search_with_score(
query="How to optimize BigQuery performance?",
text_query="BigQuery optimization",
k=10,
fetch_k=50,
hybrid_search_mode="rrf",
)See the full Hybrid Search tutorial.
- langchain-bigquery-graph -- Standalone
GraphStoreand Graph RAG retrievers for BigQuery (the upstream of this package's graph features) - langchain-bigquery-hybridsearch -- Standalone hybrid (vector + full-text) search vector store for BigQuery (the upstream of this package's hybrid search feature)
- graph-rag-with-bigquery -- Sample Agentic Graph RAG built with the Agent Development Kit (ADK), using BigQuery property graphs and this package's graph retrievers
- rag-with-bigquery-hybridsearch -- Sample Agentic RAG built with the Agent Development Kit (ADK), using this package's
BigQueryHybridSearchVectorStorewith Reciprocal Rank Fusion (RRF) to combineVECTOR_SEARCH()andSEARCH()
MIT License. See LICENSE for details.