Skip to content

ksmin23/langchain-bigquery-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BigQuery for LangChain

pypi versions

Overview

langchain-bigquery is a LangChain integration package for Google Cloud BigQuery. It enables you to build retrieval-augmented generation (RAG) and graph-based AI applications directly on top of BigQuery, leveraging its serverless scale, native vector search, full-text search, and property graph capabilities — without managing a separate vector database or graph database.

Important

What this package adds beyond langchain-google-community

The official langchain-google-community[featurestore] package ships only the basic BigQueryVectorStore (vector search via BigQuery Feature Store). It does not provide property-graph storage, Graph RAG retrievers, or hybrid (vector + full-text) search.

langchain-bigquery fills that gap by adding:

  • Property Graph supportBigQueryGraphStore and Graph RAG retrievers (BigQueryGraphVectorContextRetriever, BigQueryGraphTextToGQLRetriever) built on BigQuery's native property graph and GQL features.
  • Hybrid SearchBigQueryHybridSearchVectorStore, which combines VECTOR_SEARCH() and SEARCH() (with Reciprocal Rank Fusion) in a single retrieval step.

Use this package when you need graph-aware retrieval or hybrid keyword + semantic search on BigQuery — capabilities not available in langchain-google-community.

Features

  • BigQueryGraphStore — Store and query property graphs in BigQuery using the GQL (Graph Query Language) standard. Automatically manages node/edge tables and the underlying property graph schema.
  • BigQueryGraphVectorContextRetriever — Perform vector similarity search over graph nodes with optional multi-hop neighborhood expansion, returning rich graph context for RAG.
  • BigQueryGraphTextToGQLRetriever — Translate natural language questions into GQL queries with an LLM, with optional few-shot examples for improved accuracy.
  • BigQueryHybridSearchVectorStore — Combine BigQuery's VECTOR_SEARCH() (semantic similarity) and SEARCH() (full-text keyword matching) into a single hybrid retrieval step, with both pre-filter and Reciprocal Rank Fusion (RRF) modes.

When to Use

  • You already store your data in BigQuery and want to add semantic, hybrid, or graph-based retrieval without exporting it.
  • You need a serverless, fully managed backend for vector and graph workloads at BigQuery scale.
  • You are building Agentic RAG or Graph RAG applications (e.g., with the Agent Development Kit (ADK)) and want a single source of truth in BigQuery.

Quick Start

In order to use this library, you first need to go through the following steps:

  1. Select or create a Cloud Platform project.
  2. Enable billing for your project.
  3. Enable the Google Cloud BigQuery API.
  4. Setup Authentication.

Installation

Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.

With virtualenv, it's possible to install this library without needing system install permissions, and without clashing with the installed system dependencies.

Supported Python Versions

Python >= 3.10

Mac/Linux

pip install virtualenv
virtualenv <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install langchain-bigquery

Windows

pip install virtualenv
virtualenv <your-env>
<your-env>\Scripts\activate
<your-env>\Scripts\pip.exe install langchain-bigquery

Prerequisites

1. Google Cloud Project Setup

gcloud services enable bigquery.googleapis.com
gcloud services enable aiplatform.googleapis.com

2. Authentication

gcloud auth application-default login

To use Vertex AI with Application Default Credentials (no API key required):

export GOOGLE_GENAI_USE_VERTEXAI=true

3. IAM Permissions

The authenticated account needs the following roles (or equivalent permissions):

Role Purpose
roles/bigquery.dataEditor Create/delete tables, insert data
roles/bigquery.jobUser Run queries (VECTOR_SEARCH, SEARCH)
roles/bigquery.dataViewer Read table data and metadata
roles/aiplatform.user Access Vertex AI embedding models
PROJECT_ID=your-gcp-project-id
ACCOUNT=$(gcloud config get account)

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="user:$ACCOUNT" \
  --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="user:$ACCOUNT" \
  --role="roles/bigquery.jobUser"

4. BigQuery Dataset

A BigQuery dataset must be created before using BigQueryGraphStore or BigQueryHybridSearchVectorStore. Tables and property graphs are created automatically, but the dataset is not.

bq mk --dataset --location=us-central1 YOUR_PROJECT_ID:YOUR_DATASET

Alternatively, grant bigquery.datasets.create permission to let the library create the dataset automatically.

BigQuery Graph Store Usage

Use BigQueryGraphStore for storing and querying property graphs in BigQuery.

from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_bigquery import BigQueryGraphStore

store = BigQueryGraphStore(
    project_id="my-project",
    dataset_id="my_dataset",
    graph_name="knowledge_graph",
    location="us-central1",
)

# Define nodes and relationships
alice = Node(id="alice", type="Person", properties={"name": "Alice", "age": 30})
bob = Node(id="bob", type="Person", properties={"name": "Bob", "age": 25})
acme = Node(id="acme", type="Company", properties={"name": "Acme Corp"})

works_at = Relationship(source=alice, target=acme, type="WORKS_AT")
knows = Relationship(source=alice, target=bob, type="KNOWS")

doc = GraphDocument(
    nodes=[alice, bob, acme],
    relationships=[works_at, knows],
    source=Document(page_content="Alice works at Acme Corp and knows Bob."),
)

# This creates tables, the property graph, and inserts data
store.add_graph_documents([doc])

# Query with GQL
results = store.query(
    "GRAPH `my_dataset`.`knowledge_graph` MATCH (p:Person) RETURN p.name AS name"
)
print(results)
# [{'name': 'Alice'}, {'name': 'Bob'}]

BigQuery Graph Vector Context Retriever Usage

Use BigQueryGraphVectorContextRetriever to perform vector similarity search on graph nodes with optional multi-hop neighborhood expansion.

from langchain_google_vertexai import VertexAIEmbeddings
from langchain_bigquery import BigQueryGraphVectorContextRetriever

embeddings = VertexAIEmbeddings(model="gemini-embedding-001")

# Return specific properties from matching nodes
retriever = BigQueryGraphVectorContextRetriever.from_params(
    graph_store=store,
    embedding_service=embeddings,
    label_expr="Person",
    embeddings_column="embedding",
    return_properties_list=["name", "age"],
    top_k=5,
)
docs = retriever.invoke("Who works at Acme?")

With multi-hop expansion:

# Expand results by traversing 2 hops from matching nodes
retriever = BigQueryGraphVectorContextRetriever.from_params(
    graph_store=store,
    embedding_service=embeddings,
    label_expr="Person",
    embeddings_column="embedding",
    expand_by_hops=2,
    top_k=5,
)
docs = retriever.invoke("Tell me about Alice")

BigQuery Graph Text-to-GQL Retriever Usage

Use BigQueryGraphTextToGQLRetriever to translate natural language questions into GQL queries using an LLM.

from langchain_google_vertexai import ChatVertexAI
from langchain_bigquery import BigQueryGraphTextToGQLRetriever

llm = ChatVertexAI(model="gemini-2.5-flash")

retriever = BigQueryGraphTextToGQLRetriever.from_params(
    llm=llm,
    graph_store=store,
    k=10,
)

docs = retriever.invoke("Find all people who work at Acme Corp")

With few-shot examples for better GQL generation:

from langchain_google_vertexai import VertexAIEmbeddings

retriever = BigQueryGraphTextToGQLRetriever.from_params(
    llm=llm,
    embedding_service=VertexAIEmbeddings(model="gemini-embedding-001"),
    graph_store=store,
)

retriever.add_example(
    question="Who works at Acme?",
    gql="GRAPH `my_dataset`.`knowledge_graph` MATCH (p:Person)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'}) RETURN p.name AS name",
)

docs = retriever.invoke("Which people are employed by Acme Corp?")

See the full Graph RAG tutorial.

BigQuery Hybrid Search Usage

Use BigQueryHybridSearchVectorStore for hybrid (vector + full-text) search. Combines BigQuery VECTOR_SEARCH() (semantic similarity) with SEARCH() (full-text keyword matching) into a single retrieval step.

from langchain_bigquery import BigQueryHybridSearchVectorStore
from langchain_google_vertexai import VertexAIEmbeddings

store = BigQueryHybridSearchVectorStore(
    project_id="my-project",
    dataset_name="my_dataset",
    table_name="documents",
    location="US",
    embedding=VertexAIEmbeddings(model="gemini-embedding-001"),
    distance_type="COSINE",
    search_analyzer="LOG_ANALYZER",
)

# Pre-filter mode (default): keyword filter -> vector ranking
results = store.hybrid_search(
    query="How to optimize BigQuery performance?",
    text_query="BigQuery optimization",
    k=10,
)

# RRF mode: independent keyword + vector search -> merged ranking
results = store.hybrid_search_with_score(
    query="How to optimize BigQuery performance?",
    text_query="BigQuery optimization",
    k=10,
    fetch_k=50,
    hybrid_search_mode="rrf",
)

See the full Hybrid Search tutorial.

Related Projects

Upstream Libraries

  • langchain-bigquery-graph -- Standalone GraphStore and Graph RAG retrievers for BigQuery (the upstream of this package's graph features)
  • langchain-bigquery-hybridsearch -- Standalone hybrid (vector + full-text) search vector store for BigQuery (the upstream of this package's hybrid search feature)

Sample Applications

  • graph-rag-with-bigquery -- Sample Agentic Graph RAG built with the Agent Development Kit (ADK), using BigQuery property graphs and this package's graph retrievers
  • rag-with-bigquery-hybridsearch -- Sample Agentic RAG built with the Agent Development Kit (ADK), using this package's BigQueryHybridSearchVectorStore with Reciprocal Rank Fusion (RRF) to combine VECTOR_SEARCH() and SEARCH()

License

MIT License. See LICENSE for details.

About

LangChain integration for Google Cloud BigQuery — graph store, hybrid (vector + full-text) search, and Text-to-GQL retrievers.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages