Real-time fraud detection system for financial transactions using Hybrid Architecture (Kappa Architecture).
Simulate a high-performance Data Engineering pipeline where banking transactions are ingested, analyzed, and stored in real-time. The project demonstrates the use of Declarative Stream Processing (ksqlDB) for temporal window rules and Imperative Processing (Python) for complex business logic and persistence.
The system uses a hybrid pattern to maximize performance:
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Producer โโโโโโโถโ Kafka โโโโโโโถโ ksqlDB โ
โ (Faker) โ โ Broker โ โ Server โ
โโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ
โ โ
โ Topic: fraud_alerts โ
โผ โผ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Consumer โ โ Kafka-UI โ
โ (Python) โ โ (Monitor) โ
โโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ PostgreSQL โ
โ Database โ
โโโโโโโโโโโโโโโ
-
Ingestion: The
Producer Servicegenerates synthetic transactions (JSON) and publishes to thetransactionstopic. -
Stream Processing (ksqlDB): ksqlDB consumes the raw stream and applies:
- Immediate filters (e.g., High Amount)
- Window Aggregations (Windowing) to detect bot behavior (Velocity)
- Publishes only anomalies to the
fraud_alertstopic
-
Business Logic (Python): The
Consumer Servicelistens to the alerts topic, enriches data if necessary, and decides the final action. -
Storage (PostgreSQL): Persistent storage for auditing and analytical dashboards.
- Language: Python 3.10+ (Static Typing via
pydantic) - Messaging: Apache Kafka 3.6 (KRaft Mode - No ZooKeeper)
- Stream Processing: ksqlDB (Confluent)
- Database: PostgreSQL 15
- Infrastructure: Docker & Docker Compose
- Main Libraries:
kafka-python: Low-latency Kafka clientfaker: Realistic data generationsqlalchemy: ORM for persistence
The logic was divided to maximize performance and demonstrate proper use of each tool:
| Fraud Type | Business Rule | Responsible Technology | Why? |
|---|---|---|---|
| High Amount | Transactions > R$ 3,000.00 | ksqlDB | Simple filtering (WHERE) is instantaneous in SQL |
| High Velocity | > 3 transactions from same account in 1 minute | ksqlDB | Window Aggregation (WINDOW TUMBLING) is native and performant in ksqlDB, avoiding complex state management in Python |
| Blacklist / Context | Specific business logic | Python | Allows external queries and complex conditional logic before saving |
- Docker 20.10+ & Docker Compose
- Python 3.10+
- Git
- Clone the repository:
git clone https://github.com/your-username/stream-guard-kafka.git
cd stream-guard-kafka- Virtual Environment:
python3 -m venv venv
source venv/bin/activate # Linux/Mac
# or .\venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Start Infrastructure & Background Pipeline:
# Starts Docker, configures ksqlDB, and runs Python consumers/producers in background.
# You will be prompted at the end to view live logs.
./start.sh- Stop Infrastructure & Clean Data:
# Stops Python background processes and Docker containers.
# You will be prompted if you want to wipe PostgreSQL/Kafka data volumes.
./stop.sh- Access Services:
- Kafka-UI: http://localhost:8080 (Topic Monitoring)
- PostgreSQL:
localhost:5432(User:streamguard/ Pass:streamguard_2024)
stream-guard-kafka/
โโโ src/
โ โโโ consumer/ # Python Consumer logic
โ โโโ producer/ # Data generator script (Faker)
โ โโโ models/ # Pydantic Schemas (Data Contract)
โ โโโ database/ # Postgres connection
โ โโโ config/ # Centralized settings
โโโ ksqldb/
โ โโโ queries.sql # Stream and Table creation scripts
โโโ docker/
โ โโโ init-db.sql # Initial Postgres schema
โโโ tests/ # Unit tests
โโโ docs/ # Documentation
โโโ docker-compose.yaml # Infrastructure (Kafka, ksqlDB, Postgres)
โโโ requirements.txt
โโโ README.md
# Run the interactive generator (Menu for Batch, Stream, Velocity)
python generate_transactions.pyAccess http://localhost:8080 to view:
- Topics and messages
- Consumer groups
- Throughput metrics
# Access ksqlDB CLI
docker exec -it ksqldb-cli ksql http://ksqldb-server:8088
# Inside CLI
SHOW TOPICS;
SELECT * FROM transactions EMIT CHANGES;# List topics
docker exec -it stream-guard-kafka kafka-topics.sh \
--bootstrap-server localhost:9092 --list
# Consume messages
docker exec -it stream-guard-kafka kafka-console-consumer.sh \
--bootstrap-server localhost:9092 \
--topic transactions \
--from-beginning-- Total transactions
SELECT COUNT(*) FROM transactions;
-- Fraud rate
SELECT
COUNT(*) as total,
SUM(CASE WHEN is_fraud THEN 1 ELSE 0 END) as frauds,
ROUND(SUM(CASE WHEN is_fraud THEN 1 ELSE 0 END)::numeric / COUNT(*)::numeric * 100, 2) as fraud_rate
FROM transactions;
-- Suspicious accounts
SELECT * FROM account_risk_profile
WHERE fraud_rate > 50
ORDER BY fraud_rate DESC;- Docker Infrastructure (Kafka KRaft + Postgres)
- Python Producer (Faker)
- Fake Transaction Generator with realistic distributions
- Kafka Producer with batch and streaming modes
- ksqlDB Queries implementation (Streams & Tables)
- Python Consumer for persistence
- Dashboard in Streamlit/PowerBI
- CI/CD with GitHub Actions
- Migration to Serverless (Upstash + Cloud Run)
- Fake Generator Guide - How to generate synthetic transactions
- Kafka Producer Guide - Batch vs Streaming modes explained
- Quick Start Guide - Infrastructure setup and testing
This project demonstrates:
- โ Kappa Architecture for real-time processing
- โ Hybrid Processing: Declarative (ksqlDB) + Imperative (Python)
- โ Stream Processing with windowing and aggregations
- โ Event-Driven Architecture with Kafka
- โ Data Modeling with Pydantic schemas
- โ Clean Code principles (SOLID, type hints)
- โ Infrastructure as Code with Docker Compose
MIT License