A Node.js website crawler that finds broken links and performs SEO analysis on each page, generating a comprehensive HTML and JSON report.
- 🔍 Website Crawling: Automatically discovers and crawls all pages on a website
- 🔗 Broken Link Detection: Identifies broken links (4xx, 5xx status codes, network errors)
- 📊 SEO Analysis: Analyzes each page for:
- Title tags (presence, length)
- Meta descriptions (presence, length)
- Heading structure (H1, H2, H3)
- Image alt text
- Word count
- Open Graph tags
- Language attributes
- Canonical URLs
- Viewport meta tags
- And more...
- 📄 Report Generation: Creates detailed HTML and JSON reports
npm installnode index.js <url> [options]For Linux users with a desktop environment, you can use the provided launcher script:
./crawler.shThe launcher provides a graphical interface using zenity that:
- Prompts for the website URL
- Shows a progress dialog while crawling
- Saves reports to the
reports/directory with timestamps - Optionally opens the generated report in your default browser
Requirements for launcher:
zenity(for GUI dialogs)xdg-open(for opening the report)
Note: You may need to adjust the PROJECT_DIR variable in crawler.sh to match your installation path, and make the script executable:
chmod +x crawler.sh# Basic usage (https:// is added automatically if missing)
node index.js example.com
node index.js https://example.com
# Limit number of pages to crawl
node index.js example.com --max-pages 50
# Set custom timeout
node index.js example.com --timeout 15000
# Specify output path
node index.js example.com --output ./my-report.html
# Combine options
node index.js example.com --max-pages 200 --timeout 20000 --output ./reports/example-report.html--max-pages <number>: Maximum number of pages to crawl (default: 100)--timeout <number>: Request timeout in milliseconds (default: 10000)--output <path>: Output path for report files (default: ./report.html)
The crawler generates two files:
-
HTML Report (
report.html): A beautiful, interactive HTML report with:- Summary statistics
- List of broken links
- SEO analysis for each page
- Detailed issue listings
-
JSON Report (
report.json): Machine-readable JSON data with all crawl results
- Total pages crawled
- Pages with errors
- Total and unique links found
- Broken links count
- SEO issues (critical and warnings)
- URL
- HTTP status code
- Status text/error message
- Pages where the link was found
For each page:
- Title tag analysis
- Meta description analysis
- Heading structure (H1, H2, H3)
- Image alt text compliance
- Word count
- Language attributes
- Open Graph tags
- Issues and recommendations
- Node.js 18+ (ES modules support)
- Internet connection for crawling
For Linux launcher (crawler.sh):
zenity(for GUI dialogs)xdg-open(for opening reports in browser)
MIT