Skip to content

triqai/raw-bank-transaction-enricher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bank Transaction Enricher

Transform raw, cryptic bank transaction strings into clean, structured data -- merchant names, categories, locations, and more.

Python 3.10+ License: MIT Code style: ruff


Before a raw bank statement line:

SQ *VERVE ROASTERS gosq.com CA

After structured, enriched data:

{
  "category": {
    "primary": "Food & Beverages",
    "secondary": "Cafes",
    "confidence": 95
  },
  "channel": "in_store",
  "entities": [
    {
      "type": "merchant",
      "name": "Verve Coffee Roasters",
      "website": "vervecoffee.com"
    },
    {
      "type": "location",
      "city": "Santa Cruz",
      "state": "CA",
      "country": "US"
    },
    { "type": "intermediary", "role": "processor", "name": "Square" }
  ]
}

Why?

Bank transaction descriptions are messy. They're truncated, filled with reference codes, and inconsistent across banks and countries. This tool takes those raw strings and returns structured data you can actually use for dashboards, expense tracking, financial analytics, or PFM apps.

Powered by the Triqai API, which handles merchant identification, categorization, and enrichment.

Features

  • Merchant Identification: Resolve raw strings to clean merchant names, logos, and websites
  • Smart Categorization: Hierarchical categories (primary/secondary/tertiary) with MCC, SIC, and NAICS codes
  • Location Extraction: Structured addresses, coordinates, timezones, ratings, and price ranges
  • Intermediary Detection: Identify payment processors (Stripe, Square, Adyen), delivery platforms (DoorDash, Uber Eats), wallets (Apple Pay, Google Pay), and P2P services (Venmo, Zelle, PIX, Tikkie)
  • Person Recognition: Detect P2P transfer recipients with display names
  • Subscription Detection: Flag recurring payments and classify subscription types
  • Confidence Scores: Per-entity confidence values with explanatory reason tags
  • Async & Concurrent: Process hundreds of transactions in parallel with built-in rate limiting
  • Rich CLI Output: Progress bars, colored tables, and summary statistics

Quick Start

1. Install dependencies

git clone https://github.com/triqai/bank-transaction-enricher.git
cd bank-transaction-enricher
pip install -r requirements.txt

2. Get your API key

Sign up for a free API key at triqai.com the free tier includes enough requests to test with the included sample dataset.

cp .env.example .env
# Edit .env and add your API key

3. Run

python main.py

That's it. The sample dataset of 40 real-world transactions will be enriched and results saved to output/.

Usage

Command Line

# Enrich the included sample dataset
python main.py

# Use your own CSV file
python main.py --input your_transactions.csv

# Save as JSON Lines (one object per line, better for streaming)
python main.py --format jsonl

# Increase concurrency for large datasets
python main.py --max-concurrent 10

# Preview without making API calls
python main.py --dry-run

# Verbose logging for debugging
python main.py --verbose

# See all options
python main.py --help

Python API

import asyncio
from src import TriqaiClient, Transaction

async def main():
    client = TriqaiClient(api_key="your_api_key")

    transaction = Transaction(
        title="AMAZON MKTPLACE PMTS AMZN.COM/BILL WA",
        country="US",
        type="expense",
    )

    result = await client.enrich(transaction)

    if result.success:
        data = result.data

        # Merchant info (from entities array)
        merchant = data.merchant
        if merchant:
            print(f"Merchant: {merchant.get_name()}")
            print(f"Website:  {merchant.data.get('website')}")
            print(f"Confidence: {merchant.confidence.value} ({merchant.confidence.reasons})")

        # Category
        print(f"Category: {data.transaction.get_primary_category_name()}")

        # Location (from entities array)
        location = data.location
        if location:
            structured = location.data.get("structured", {})
            print(f"Location: {structured.get('city')}, {structured.get('state')}")

        # Intermediary (processors, platforms, wallets, P2P services)
        intermediary = data.intermediary
        if intermediary:
            print(f"Intermediary: {intermediary.get_name()} (role: {intermediary.role})")

        # Person (P2P recipient)
        person = data.person
        if person:
            print(f"Recipient: {person.get_name()}")

asyncio.run(main())

Batch Processing

import asyncio
from src import TriqaiClient, TransactionEnricher

async def main():
    client = TriqaiClient(api_key="your_api_key", max_concurrent=10)
    enricher = TransactionEnricher(client=client, output_dir="results")

    # Load from CSV
    transactions = enricher.load_transactions_from_csv("data/transactions.csv")

    # Enrich all (with automatic rate limiting and retries)
    results = await enricher.enrich_transactions(transactions)

    # Save results
    enricher.save_results(results, output_format="json")
    enricher.save_summary(results)

    # Print report
    print(enricher.generate_report(results))

asyncio.run(main())

Input Format

Prepare a CSV with these columns. The delimiter is auto-detected (, or ;) so no quoting is required. The comment column is entirely optional -- you can omit it from the file.

country;type;title
US;expense;SQ *VERVE ROASTERS gosq.com CA
GB;expense;CARD PAYMENT - FALLOW LONDON
BR;expense;PIX ENVIADO - JOAO SILVA
NL;income;SALARIS DECEMBER 25 #890

Commas work too:

country,type,title
US,expense,SQ *VERVE ROASTERS gosq.com CA
GB,expense,CARD PAYMENT - FALLOW LONDON
Column Required Description
country Yes ISO 3166-1 alpha-2 code (US, GB, BR)
type Yes expense or income
title Yes Raw transaction string from the bank
comment No Optional note (column may be omitted entirely; not sent to API)

Sample Dataset

The included data/transactions.csv covers diverse real-world patterns across 18 countries:

Pattern Examples
Retail Amazon (DE long-form SEPA), Walmart, Target
Food & Drink Restaurants, coffee shops, delivery services
Subscriptions Adobe Creative Cloud, Apple One
P2P Transfers PIX (BR), Venmo (US), Tikkie (NL), VIPPS (NO)
Payroll Salary deposits (US, NL, KR, JP)
Freelance Upwork payouts, Adobe Stock, Twitch affiliates
International Japanese (ファミリーマート), Korean (삼성전자), special characters
Complex Multi-processor chains, long SEPA references

Output

Results are saved to the output/ directory:

  • enrichments_<timestamp>.json -- Full enrichment data for each transaction
  • summary_<timestamp>.json -- Aggregate statistics and category distribution

Example Enrichment Response

The API uses an entities array pattern. Only identified entities are included. Each entity has a type, role, confidence (with reason tags), and type-specific data.

{
  "input": {
    "title": "SQ *VERVE ROASTERS gosq.com CA",
    "country": "US",
    "type": "expense"
  },
  "success": true,
  "data": {
    "transaction": {
      "category": {
        "primary": {
          "name": "Food & Beverages",
          "code": { "mcc": 5814, "sic": 5812, "naics": 722515 }
        },
        "secondary": {
          "name": "Cafes",
          "code": { "mcc": 5814, "sic": 5812, "naics": 722515 }
        },
        "confidence": 95
      },
      "channel": "in_store",
      "subscription": { "recurring": false },
      "confidence": { "value": 92, "reasons": [] }
    },
    "entities": [
      {
        "type": "merchant",
        "role": "organization",
        "confidence": {
          "value": 98,
          "reasons": ["name_closely_matched", "results_consensus"]
        },
        "data": {
          "id": "...",
          "name": "Verve Coffee Roasters",
          "alias": [],
          "website": "vervecoffee.com",
          "icon": "https://..."
        }
      },
      {
        "type": "location",
        "role": "store_location",
        "confidence": {
          "value": 85,
          "reasons": ["city_match", "address_closely_matched"]
        },
        "data": {
          "id": "...",
          "name": "Verve Coffee Roasters",
          "formatted": "1540 Pacific Ave, Santa Cruz, CA 95060, US",
          "structured": {
            "street": "1540 Pacific Ave",
            "city": "Santa Cruz",
            "state": "CA",
            "postalCode": "95060",
            "country": "US",
            "countryName": "United States",
            "coordinates": { "latitude": 36.9741, "longitude": -122.0308 },
            "timezone": "America/Los_Angeles"
          }
        }
      },
      {
        "type": "intermediary",
        "role": "processor",
        "confidence": { "value": 99, "reasons": ["known_processor_match"] },
        "data": { "id": "...", "name": "Square", "website": "squareup.com" }
      }
    ]
  }
}

Configuration

Environment Variable Default Description
TRIQAI_API_KEY -- Your Triqai API key (required)
MAX_CONCURRENT_REQUESTS 2 Max in-flight requests at once (free plan limit: 2)
REQUEST_DELAY 1.0 Min seconds between dispatching requests (free plan: 1.0 = 1 RPS)

Defaults are tuned for the free plan. On a paid plan you can raise both values. All options can also be passed as CLI arguments, run python main.py --help for details.

Rate Limiting

The API enforces two independent limits:

  • RPS (token bucket) - sustains requests per second; tracked by X-RateLimit-* headers
  • Concurrency cap - max parallel in-flight requests; tracked by X-RateLimit-Concurrency-* headers

The client enforces both automatically with exponential backoff and retries. You don't need to manage this yourself. When a 429 is returned, the Retry-After header (in seconds) is honoured before the next attempt. 503 Service Unavailable is also retried.

Current rate limit status is displayed after each run and can be inspected via:

info = client.rate_limit_info
# RateLimitInfo(
#   limit=1, remaining=0, reset='2026-02-16T10:30:01Z', scope='rps',
#   concurrency_limit=2, concurrency_remaining=1,
#   retry_after_seconds=1
# )

Response headers tracked:

Header Description
X-RateLimit-Limit Requests per second (sustained rate)
X-RateLimit-Remaining RPS tokens available right now
X-RateLimit-Reset ISO timestamp when the RPS bucket refills
X-RateLimit-Scope Active limit dimension: rps or concurrency
X-RateLimit-Concurrency-Limit Max concurrent in-flight requests for your org
X-RateLimit-Concurrency-Remaining Remaining concurrency slots
Retry-After Seconds to wait before retrying (429 and 503)

Project Structure

bank-transaction-enricher/
├── main.py              # CLI entry point
├── src/
│   ├── __init__.py      # Package exports
│   ├── client.py        # Async API client with rate limiting & retries
│   ├── enricher.py      # High-level enrichment orchestrator
│   └── models.py        # Pydantic models for API request/response
├── data/
│   └── transactions.csv # Sample dataset (40 transactions, 18 countries)
├── output/              # Generated results (git-ignored)
├── pyproject.toml       # Project metadata and tool config
├── requirements.txt     # Python dependencies
└── .env.example         # Environment variable template

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run linter
ruff check .

# Run type checker
mypy src/

# Run tests
pytest

License

MIT License, see LICENSE for details.

About

Transform cryptic bank transaction strings into structured, actionable data with Triqai API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages