Transform raw, cryptic bank transaction strings into clean, structured data -- merchant names, categories, locations, and more.
Before a raw bank statement line:
SQ *VERVE ROASTERS gosq.com CA
After structured, enriched data:
{
"category": {
"primary": "Food & Beverages",
"secondary": "Cafes",
"confidence": 95
},
"channel": "in_store",
"entities": [
{
"type": "merchant",
"name": "Verve Coffee Roasters",
"website": "vervecoffee.com"
},
{
"type": "location",
"city": "Santa Cruz",
"state": "CA",
"country": "US"
},
{ "type": "intermediary", "role": "processor", "name": "Square" }
]
}Bank transaction descriptions are messy. They're truncated, filled with reference codes, and inconsistent across banks and countries. This tool takes those raw strings and returns structured data you can actually use for dashboards, expense tracking, financial analytics, or PFM apps.
Powered by the Triqai API, which handles merchant identification, categorization, and enrichment.
- Merchant Identification: Resolve raw strings to clean merchant names, logos, and websites
- Smart Categorization: Hierarchical categories (primary/secondary/tertiary) with MCC, SIC, and NAICS codes
- Location Extraction: Structured addresses, coordinates, timezones, ratings, and price ranges
- Intermediary Detection: Identify payment processors (Stripe, Square, Adyen), delivery platforms (DoorDash, Uber Eats), wallets (Apple Pay, Google Pay), and P2P services (Venmo, Zelle, PIX, Tikkie)
- Person Recognition: Detect P2P transfer recipients with display names
- Subscription Detection: Flag recurring payments and classify subscription types
- Confidence Scores: Per-entity confidence values with explanatory reason tags
- Async & Concurrent: Process hundreds of transactions in parallel with built-in rate limiting
- Rich CLI Output: Progress bars, colored tables, and summary statistics
git clone https://github.com/triqai/bank-transaction-enricher.git
cd bank-transaction-enricher
pip install -r requirements.txtSign up for a free API key at triqai.com the free tier includes enough requests to test with the included sample dataset.
cp .env.example .env
# Edit .env and add your API keypython main.pyThat's it. The sample dataset of 40 real-world transactions will be enriched and results saved to output/.
# Enrich the included sample dataset
python main.py
# Use your own CSV file
python main.py --input your_transactions.csv
# Save as JSON Lines (one object per line, better for streaming)
python main.py --format jsonl
# Increase concurrency for large datasets
python main.py --max-concurrent 10
# Preview without making API calls
python main.py --dry-run
# Verbose logging for debugging
python main.py --verbose
# See all options
python main.py --helpimport asyncio
from src import TriqaiClient, Transaction
async def main():
client = TriqaiClient(api_key="your_api_key")
transaction = Transaction(
title="AMAZON MKTPLACE PMTS AMZN.COM/BILL WA",
country="US",
type="expense",
)
result = await client.enrich(transaction)
if result.success:
data = result.data
# Merchant info (from entities array)
merchant = data.merchant
if merchant:
print(f"Merchant: {merchant.get_name()}")
print(f"Website: {merchant.data.get('website')}")
print(f"Confidence: {merchant.confidence.value} ({merchant.confidence.reasons})")
# Category
print(f"Category: {data.transaction.get_primary_category_name()}")
# Location (from entities array)
location = data.location
if location:
structured = location.data.get("structured", {})
print(f"Location: {structured.get('city')}, {structured.get('state')}")
# Intermediary (processors, platforms, wallets, P2P services)
intermediary = data.intermediary
if intermediary:
print(f"Intermediary: {intermediary.get_name()} (role: {intermediary.role})")
# Person (P2P recipient)
person = data.person
if person:
print(f"Recipient: {person.get_name()}")
asyncio.run(main())import asyncio
from src import TriqaiClient, TransactionEnricher
async def main():
client = TriqaiClient(api_key="your_api_key", max_concurrent=10)
enricher = TransactionEnricher(client=client, output_dir="results")
# Load from CSV
transactions = enricher.load_transactions_from_csv("data/transactions.csv")
# Enrich all (with automatic rate limiting and retries)
results = await enricher.enrich_transactions(transactions)
# Save results
enricher.save_results(results, output_format="json")
enricher.save_summary(results)
# Print report
print(enricher.generate_report(results))
asyncio.run(main())Prepare a CSV with these columns. The delimiter is auto-detected (, or ;) so no quoting is required. The comment column is entirely optional -- you can omit it from the file.
country;type;title
US;expense;SQ *VERVE ROASTERS gosq.com CA
GB;expense;CARD PAYMENT - FALLOW LONDON
BR;expense;PIX ENVIADO - JOAO SILVA
NL;income;SALARIS DECEMBER 25 #890Commas work too:
country,type,title
US,expense,SQ *VERVE ROASTERS gosq.com CA
GB,expense,CARD PAYMENT - FALLOW LONDON| Column | Required | Description |
|---|---|---|
country |
Yes | ISO 3166-1 alpha-2 code (US, GB, BR) |
type |
Yes | expense or income |
title |
Yes | Raw transaction string from the bank |
comment |
No | Optional note (column may be omitted entirely; not sent to API) |
The included data/transactions.csv covers diverse real-world patterns across 18 countries:
| Pattern | Examples |
|---|---|
| Retail | Amazon (DE long-form SEPA), Walmart, Target |
| Food & Drink | Restaurants, coffee shops, delivery services |
| Subscriptions | Adobe Creative Cloud, Apple One |
| P2P Transfers | PIX (BR), Venmo (US), Tikkie (NL), VIPPS (NO) |
| Payroll | Salary deposits (US, NL, KR, JP) |
| Freelance | Upwork payouts, Adobe Stock, Twitch affiliates |
| International | Japanese (ファミリーマート), Korean (삼성전자), special characters |
| Complex | Multi-processor chains, long SEPA references |
Results are saved to the output/ directory:
enrichments_<timestamp>.json-- Full enrichment data for each transactionsummary_<timestamp>.json-- Aggregate statistics and category distribution
The API uses an entities array pattern. Only identified entities are included. Each entity has a type, role, confidence (with reason tags), and type-specific data.
{
"input": {
"title": "SQ *VERVE ROASTERS gosq.com CA",
"country": "US",
"type": "expense"
},
"success": true,
"data": {
"transaction": {
"category": {
"primary": {
"name": "Food & Beverages",
"code": { "mcc": 5814, "sic": 5812, "naics": 722515 }
},
"secondary": {
"name": "Cafes",
"code": { "mcc": 5814, "sic": 5812, "naics": 722515 }
},
"confidence": 95
},
"channel": "in_store",
"subscription": { "recurring": false },
"confidence": { "value": 92, "reasons": [] }
},
"entities": [
{
"type": "merchant",
"role": "organization",
"confidence": {
"value": 98,
"reasons": ["name_closely_matched", "results_consensus"]
},
"data": {
"id": "...",
"name": "Verve Coffee Roasters",
"alias": [],
"website": "vervecoffee.com",
"icon": "https://..."
}
},
{
"type": "location",
"role": "store_location",
"confidence": {
"value": 85,
"reasons": ["city_match", "address_closely_matched"]
},
"data": {
"id": "...",
"name": "Verve Coffee Roasters",
"formatted": "1540 Pacific Ave, Santa Cruz, CA 95060, US",
"structured": {
"street": "1540 Pacific Ave",
"city": "Santa Cruz",
"state": "CA",
"postalCode": "95060",
"country": "US",
"countryName": "United States",
"coordinates": { "latitude": 36.9741, "longitude": -122.0308 },
"timezone": "America/Los_Angeles"
}
}
},
{
"type": "intermediary",
"role": "processor",
"confidence": { "value": 99, "reasons": ["known_processor_match"] },
"data": { "id": "...", "name": "Square", "website": "squareup.com" }
}
]
}
}| Environment Variable | Default | Description |
|---|---|---|
TRIQAI_API_KEY |
-- | Your Triqai API key (required) |
MAX_CONCURRENT_REQUESTS |
2 |
Max in-flight requests at once (free plan limit: 2) |
REQUEST_DELAY |
1.0 |
Min seconds between dispatching requests (free plan: 1.0 = 1 RPS) |
Defaults are tuned for the free plan. On a paid plan you can raise both values. All options can also be passed as CLI arguments, run python main.py --help for details.
The API enforces two independent limits:
- RPS (token bucket) - sustains requests per second; tracked by
X-RateLimit-*headers - Concurrency cap - max parallel in-flight requests; tracked by
X-RateLimit-Concurrency-*headers
The client enforces both automatically with exponential backoff and retries. You don't need to manage this yourself. When a 429 is returned, the Retry-After header (in seconds) is honoured before the next attempt. 503 Service Unavailable is also retried.
Current rate limit status is displayed after each run and can be inspected via:
info = client.rate_limit_info
# RateLimitInfo(
# limit=1, remaining=0, reset='2026-02-16T10:30:01Z', scope='rps',
# concurrency_limit=2, concurrency_remaining=1,
# retry_after_seconds=1
# )Response headers tracked:
| Header | Description |
|---|---|
X-RateLimit-Limit |
Requests per second (sustained rate) |
X-RateLimit-Remaining |
RPS tokens available right now |
X-RateLimit-Reset |
ISO timestamp when the RPS bucket refills |
X-RateLimit-Scope |
Active limit dimension: rps or concurrency |
X-RateLimit-Concurrency-Limit |
Max concurrent in-flight requests for your org |
X-RateLimit-Concurrency-Remaining |
Remaining concurrency slots |
Retry-After |
Seconds to wait before retrying (429 and 503) |
bank-transaction-enricher/
├── main.py # CLI entry point
├── src/
│ ├── __init__.py # Package exports
│ ├── client.py # Async API client with rate limiting & retries
│ ├── enricher.py # High-level enrichment orchestrator
│ └── models.py # Pydantic models for API request/response
├── data/
│ └── transactions.csv # Sample dataset (40 transactions, 18 countries)
├── output/ # Generated results (git-ignored)
├── pyproject.toml # Project metadata and tool config
├── requirements.txt # Python dependencies
└── .env.example # Environment variable template
# Install dev dependencies
pip install -e ".[dev]"
# Run linter
ruff check .
# Run type checker
mypy src/
# Run tests
pytestMIT License, see LICENSE for details.