Bank Transaction Enricher

Transform raw, cryptic bank transaction strings into clean, structured data -- merchant names, categories, locations, and more.

Before a raw bank statement line:

SQ *VERVE ROASTERS gosq.com CA

After structured, enriched data:

{
  "category": {
    "primary": "Food & Beverages",
    "secondary": "Cafes",
    "confidence": 95
  },
  "channel": "in_store",
  "entities": [
    {
      "type": "merchant",
      "name": "Verve Coffee Roasters",
      "website": "vervecoffee.com"
    },
    {
      "type": "location",
      "city": "Santa Cruz",
      "state": "CA",
      "country": "US"
    },
    { "type": "intermediary", "role": "processor", "name": "Square" }
  ]
}

Why?

Bank transaction descriptions are messy. They're truncated, filled with reference codes, and inconsistent across banks and countries. This tool takes those raw strings and returns structured data you can actually use for dashboards, expense tracking, financial analytics, or PFM apps.

Powered by the Triqai API, which handles merchant identification, categorization, and enrichment.

Features

Merchant Identification: Resolve raw strings to clean merchant names, logos, and websites
Smart Categorization: Hierarchical categories (primary/secondary/tertiary) with MCC, SIC, and NAICS codes
Location Extraction: Structured addresses, coordinates, timezones, ratings, and price ranges
Intermediary Detection: Identify payment processors (Stripe, Square, Adyen), delivery platforms (DoorDash, Uber Eats), wallets (Apple Pay, Google Pay), and P2P services (Venmo, Zelle, PIX, Tikkie)
Person Recognition: Detect P2P transfer recipients with display names
Subscription Detection: Flag recurring payments and classify subscription types
Confidence Scores: Per-entity confidence values with explanatory reason tags
Async & Concurrent: Process hundreds of transactions in parallel with built-in rate limiting
Rich CLI Output: Progress bars, colored tables, and summary statistics

Quick Start

1. Install dependencies

git clone https://github.com/triqai/bank-transaction-enricher.git
cd bank-transaction-enricher
pip install -r requirements.txt

2. Get your API key

Sign up for a free API key at triqai.com the free tier includes enough requests to test with the included sample dataset.

cp .env.example .env
# Edit .env and add your API key

3. Run

python main.py

That's it. The sample dataset of 40 real-world transactions will be enriched and results saved to output/.

Usage

Command Line

# Enrich the included sample dataset
python main.py

# Use your own CSV file
python main.py --input your_transactions.csv

# Save as JSON Lines (one object per line, better for streaming)
python main.py --format jsonl

# Increase concurrency for large datasets
python main.py --max-concurrent 10

# Preview without making API calls
python main.py --dry-run

# Verbose logging for debugging
python main.py --verbose

# See all options
python main.py --help

Python API

import asyncio
from src import TriqaiClient, Transaction

async def main():
    client = TriqaiClient(api_key="your_api_key")

    transaction = Transaction(
        title="AMAZON MKTPLACE PMTS AMZN.COM/BILL WA",
        country="US",
        type="expense",
    )

    result = await client.enrich(transaction)

    if result.success:
        data = result.data

        # Merchant info (from entities array)
        merchant = data.merchant
        if merchant:
            print(f"Merchant: {merchant.get_name()}")
            print(f"Website:  {merchant.data.get('website')}")
            print(f"Confidence: {merchant.confidence.value} ({merchant.confidence.reasons})")

        # Category
        print(f"Category: {data.transaction.get_primary_category_name()}")

        # Location (from entities array)
        location = data.location
        if location:
            structured = location.data.get("structured", {})
            print(f"Location: {structured.get('city')}, {structured.get('state')}")

        # Intermediary (processors, platforms, wallets, P2P services)
        intermediary = data.intermediary
        if intermediary:
            print(f"Intermediary: {intermediary.get_name()} (role: {intermediary.role})")

        # Person (P2P recipient)
        person = data.person
        if person:
            print(f"Recipient: {person.get_name()}")

asyncio.run(main())

Batch Processing

import asyncio
from src import TriqaiClient, TransactionEnricher

async def main():
    client = TriqaiClient(api_key="your_api_key", max_concurrent=10)
    enricher = TransactionEnricher(client=client, output_dir="results")

    # Load from CSV
    transactions = enricher.load_transactions_from_csv("data/transactions.csv")

    # Enrich all (with automatic rate limiting and retries)
    results = await enricher.enrich_transactions(transactions)

    # Save results
    enricher.save_results(results, output_format="json")
    enricher.save_summary(results)

    # Print report
    print(enricher.generate_report(results))

asyncio.run(main())

Input Format

Prepare a CSV with these columns. The delimiter is auto-detected (, or ;) so no quoting is required. The comment column is entirely optional -- you can omit it from the file.

country;type;title
US;expense;SQ *VERVE ROASTERS gosq.com CA
GB;expense;CARD PAYMENT - FALLOW LONDON
BR;expense;PIX ENVIADO - JOAO SILVA
NL;income;SALARIS DECEMBER 25 #890

Commas work too:

country,type,title
US,expense,SQ *VERVE ROASTERS gosq.com CA
GB,expense,CARD PAYMENT - FALLOW LONDON

Column	Required	Description
`country`	Yes	ISO 3166-1 alpha-2 code (`US`, `GB`, `BR`)
`type`	Yes	`expense` or `income`
`title`	Yes	Raw transaction string from the bank
`comment`	No	Optional note (column may be omitted entirely; not sent to API)

Sample Dataset

The included data/transactions.csv covers diverse real-world patterns across 18 countries:

Pattern	Examples
Retail	Amazon (DE long-form SEPA), Walmart, Target
Food & Drink	Restaurants, coffee shops, delivery services
Subscriptions	Adobe Creative Cloud, Apple One
P2P Transfers	PIX (BR), Venmo (US), Tikkie (NL), VIPPS (NO)
Payroll	Salary deposits (US, NL, KR, JP)
Freelance	Upwork payouts, Adobe Stock, Twitch affiliates
International	Japanese (ファミリーマート), Korean (삼성전자), special characters
Complex	Multi-processor chains, long SEPA references

Output

Results are saved to the output/ directory:

enrichments_<timestamp>.json -- Full enrichment data for each transaction
summary_<timestamp>.json -- Aggregate statistics and category distribution

Example Enrichment Response

The API uses an entities array pattern. Only identified entities are included. Each entity has a type, role, confidence (with reason tags), and type-specific data.

{
  "input": {
    "title": "SQ *VERVE ROASTERS gosq.com CA",
    "country": "US",
    "type": "expense"
  },
  "success": true,
  "data": {
    "transaction": {
      "category": {
        "primary": {
          "name": "Food & Beverages",
          "code": { "mcc": 5814, "sic": 5812, "naics": 722515 }
        },
        "secondary": {
          "name": "Cafes",
          "code": { "mcc": 5814, "sic": 5812, "naics": 722515 }
        },
        "confidence": 95
      },
      "channel": "in_store",
      "subscription": { "recurring": false },
      "confidence": { "value": 92, "reasons": [] }
    },
    "entities": [
      {
        "type": "merchant",
        "role": "organization",
        "confidence": {
          "value": 98,
          "reasons": ["name_closely_matched", "results_consensus"]
        },
        "data": {
          "id": "...",
          "name": "Verve Coffee Roasters",
          "alias": [],
          "website": "vervecoffee.com",
          "icon": "https://..."
        }
      },
      {
        "type": "location",
        "role": "store_location",
        "confidence": {
          "value": 85,
          "reasons": ["city_match", "address_closely_matched"]
        },
        "data": {
          "id": "...",
          "name": "Verve Coffee Roasters",
          "formatted": "1540 Pacific Ave, Santa Cruz, CA 95060, US",
          "structured": {
            "street": "1540 Pacific Ave",
            "city": "Santa Cruz",
            "state": "CA",
            "postalCode": "95060",
            "country": "US",
            "countryName": "United States",
            "coordinates": { "latitude": 36.9741, "longitude": -122.0308 },
            "timezone": "America/Los_Angeles"
          }
        }
      },
      {
        "type": "intermediary",
        "role": "processor",
        "confidence": { "value": 99, "reasons": ["known_processor_match"] },
        "data": { "id": "...", "name": "Square", "website": "squareup.com" }
      }
    ]
  }
}

Configuration

Environment Variable	Default	Description
`TRIQAI_API_KEY`	--	Your Triqai API key (required)
`MAX_CONCURRENT_REQUESTS`	`2`	Max in-flight requests at once (free plan limit: 2)
`REQUEST_DELAY`	`1.0`	Min seconds between dispatching requests (free plan: 1.0 = 1 RPS)

Defaults are tuned for the free plan. On a paid plan you can raise both values. All options can also be passed as CLI arguments, run python main.py --help for details.

Rate Limiting

The API enforces two independent limits:

RPS (token bucket) - sustains requests per second; tracked by X-RateLimit-* headers
Concurrency cap - max parallel in-flight requests; tracked by X-RateLimit-Concurrency-* headers

The client enforces both automatically with exponential backoff and retries. You don't need to manage this yourself. When a 429 is returned, the Retry-After header (in seconds) is honoured before the next attempt. 503 Service Unavailable is also retried.

Current rate limit status is displayed after each run and can be inspected via:

info = client.rate_limit_info
# RateLimitInfo(
#   limit=1, remaining=0, reset='2026-02-16T10:30:01Z', scope='rps',
#   concurrency_limit=2, concurrency_remaining=1,
#   retry_after_seconds=1
# )

Response headers tracked:

Header	Description
`X-RateLimit-Limit`	Requests per second (sustained rate)
`X-RateLimit-Remaining`	RPS tokens available right now
`X-RateLimit-Reset`	ISO timestamp when the RPS bucket refills
`X-RateLimit-Scope`	Active limit dimension: `rps` or `concurrency`
`X-RateLimit-Concurrency-Limit`	Max concurrent in-flight requests for your org
`X-RateLimit-Concurrency-Remaining`	Remaining concurrency slots
`Retry-After`	Seconds to wait before retrying (429 and 503)

Project Structure

bank-transaction-enricher/
├── main.py              # CLI entry point
├── src/
│   ├── __init__.py      # Package exports
│   ├── client.py        # Async API client with rate limiting & retries
│   ├── enricher.py      # High-level enrichment orchestrator
│   └── models.py        # Pydantic models for API request/response
├── data/
│   └── transactions.csv # Sample dataset (40 transactions, 18 countries)
├── output/              # Generated results (git-ignored)
├── pyproject.toml       # Project metadata and tool config
├── requirements.txt     # Python dependencies
└── .env.example         # Environment variable template

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run linter
ruff check .

# Run type checker
mypy src/

# Run tests
pytest

License

MIT License, see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bank Transaction Enricher

Why?

Features

Quick Start

1. Install dependencies

2. Get your API key

3. Run

Usage

Command Line

Python API

Batch Processing

Input Format

Sample Dataset

Output

Example Enrichment Response

Configuration

Rate Limiting

Project Structure

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Bank Transaction Enricher

Why?

Features

Quick Start

1. Install dependencies

2. Get your API key

3. Run

Usage

Command Line

Python API

Batch Processing

Input Format

Sample Dataset

Output

Example Enrichment Response

Configuration

Rate Limiting

Project Structure

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages