Skip to content

abhizaik/phishing-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

224 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Phishing Detection Engine (Open-Source, Real-Time URL Scanner)

An open-source, production-grade phishing detection engine for fast, explainable, and auditable URL analysis. Detect phishing and malicious URLs with transparent scoring, clear verdicts, and detailed reports β€” all in under a second. Designed for self-hosting, auditing and easy integration into web apps, APIs, and browser extensions.

Go Svelte License GitHub stars

⚑ Quick Start Β· πŸ› Architecture Β· πŸ“š Docs Β· 🀝 Contributing Β· 🌍 Community


Phishing Detection Demo

Paste a URL β†’ get a trust score, verdict, and detailed report in under a second.

Phishing Detection Demo


Why Use This Tool?

Most phishing detection solutions are either closed commercial APIs or academic ML demos:

  • Commercial tools: expensive, opaque, and impossible to audit
  • ML demos: slow, fragile, and not built for real-world deployment

Phishing remains a top cyber threat because defenders lack fast, explainable, and controllable detection systems. This engine fills that gap by providing:

  • Transparent, explainable analysis β€” every verdict is backed by concrete signals
  • Fast, real-time scanning β€” multiple analyzers run in parallel
  • Flexible integration β€” web UI, HTTP API, browser extension
  • Full open-source control β€” audit, modify, self-host, and scale

What This Tool Does

  • Scans URLs for phishing, malicious behavior, and unsafe redirects
  • Produces a trust score, clear verdict, and detailed report
  • Supports developers and non-technical users via UI, API, and extension
  • Uses multiple independent heuristic analyzers for accurate detection
  • Built with Go (backend) and Svelte (frontend) for production use

Who This Is For

Everyday users

  • Quickly check suspicious URLs in a website or browser extension

Developers

  • Integrate phishing detection into applications or backend services
  • Replace or supplement commercial phishing APIs

Security engineers & SOC teams

  • Build explainable phishing detection pipelines
  • Audit URLs with transparent, actionable signals

Students & researchers

  • Use a real-world, production-grade reference for academic or security projects. Academic or research use of this project must cite this repository (see CITATION.cff).

Use Cases

  • Detect phishing links before users click them
  • Scan URLs for malicious behavior
  • Build anti-phishing browser extensions
  • Integrate phishing detection into backend services
  • Replace or supplement commercial phishing APIs

Detection Techniques Used

The engine evaluates URLs using multiple independent analyzers, including:

  • Domain reputation & age checks
  • Suspicious URL patterns and homoglyphs
  • Redirect chain analysis
  • HTTPS / certificate anomalies
  • Known phishing indicators and heuristics
  • Content-based signals (HTML, scripts, forms)

Each analyzer contributes to a final trust score and verdict.

Architecture

High level repository layout:

server/               Go backend 
  cmd/safesurf        Backend entry point
  internal/           Analyzers, domaininfo, screenshot
web/website           SvelteKit UI
web/chrome-extension  Chrome extension
docker/               Dev & prod
docs/                 Setup, architecture, API, security, testing etc.
Makefile

Quick Start

Full setup: docs/setup.md

  1. Clone the repo
git clone https://github.com/abhizaik/phishing-detection.git
cd phishing-detection
  1. Start the application

Prerequisite: Docker must be installed and running.
Windows: Use WSL or install make.

make build
make up

Web UI: localhost:3000

API Example

The phishing detection engine exposes a simple HTTP API for real-time URL analysis. Scan a URL using the API:

curl -X GET http://localhost:8080/api/v1/analyze?url=https://example.com
Example API response

{
    "url": "http://google.com/abhi",
    "domain": "google.com",
    "features": {
        "rank": 1,
        "tld": {
            "tld": "com",
            "is_trusted_tld": false,
            "is_risky_tld": false,
            "is_icann": true
        },
        "url": {
            "url_shortener": false,
            "uses_ip": false,
            "contains_punycode": false,
            "too_long": false,
            "too_deep": false,
            "has_homoglyph": false,
            "subdomain_count": 0,
            "keywords": {
                "has_keywords": false,
                "found": [],
                "categories": {}
            }
        }
    },
    "infrastructure": {
        "ip_addresses": [
            "142.250.182.78",
            "2404:6800:4007:810::200e"
        ],
        "nameservers_valid": true,
        "ns_hosts": [
            "ns2.google.com."
        ],
        "mx_records_valid": true,
        "mx_hosts": [
            "smtp.google.com."
        ]
    },
    "domain_info": {
        "domain": "GOOGLE.COM",
        "registrar": "MarkMonitor Inc.",
        "created": "1997-09-15T04:00:00Z",
        "updated": "2019-09-09T15:39:04Z",
        "expiry": "2028-09-14T04:00:00Z",
        "nameservers": [
            "NS1.GOOGLE.COM"
        ],
        "status": [
            "client delete prohibited"
        ],
        "dnssec": false,
        "age_human": "28 years 4 months",
        "age_days": 10350,
        "raw": "{\"ldhName\":\"GOOGLE.COM\ etc."}",
        "source": "RDAP"
    },
    "analysis": {
        "redirection_result": {
            "is_redirected": false,
            "chain_length": 1,
            "chain": [
                "http://google.com/abhi"
            ],
            "final_url": "http://google.com/abhi",
            "final_url_domain": "google.com",
            "has_domain_jump": false
        },
        "http_status": {
            "code": 404,
            "text": "Not Found",
            "success": false,
            "is_redirect": false
        },
        "is_hsts_supported": false
    },
    "result": {
        "risk_score": 0,
        "trust_score": 100,
        "final_score": 100,
        "verdict": "Safe",
        "reasons": {
            "neutral_reasons": [
                "Standard, officially recognized domain extension.",
                "Standard DNS security (DNSSEC not enabled)."
            ],
            "good_reasons": [
                "Global Giant: Ranked #1 worldwide.",
                "Valid DNS configuration detected.",
                "Valid email server configuration (MX Records).",
                "Long-standing domain history (28 years 4 months).",
                "Registered with MarkMonitor Inc."
            ],
            "bad_reasons": null
        }
    },
    "incomplete": false,
    "errors": null
}

Real-Time Performance

  • Typical scan time: ~300–700 ms per URL
  • Designed to handle multiple concurrent scans efficiently
  • Optimized for real-time phishing detection at scale

Exact performance depends on enabled analyzers and network conditions.

Security & Privacy

  • No URL data is sent to third-party services by default
  • All analysis runs locally or in your own infrastructure
  • Designed for auditability, privacy, and controlled environments

Limitations

  • Heuristic false positives

Documentation

All documentation is under docs/. Start here docs/README.md

Contributing

Bug reports, feature requests, and pull requests are welcome.

Use GitHub Issues to report bugs or suggest features. For code contributions, see CONTRIBUTING.md.

Community

If you found this project helpful, consider giving it a star. It directly helps visibility and continued development.

Have bugs, ideas, or feature requests? Open an issue or start a discussion. Contributions and feedback are welcome.

Thanks for helping make the web safer.