Skip to content

neoosu/chicom_crawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Google Search Competitor Analysis Tool

A Python-based web scraping tool that performs comprehensive Google search analysis for competitor monitoring across multiple websites and brands.

๐Ÿš€ Features

  • Multi-pattern Search: Uses multiple search strategies for comprehensive coverage
  • Date Range Filtering: Search within specific date ranges
  • Batch Processing: Efficiently handles large CSV inputs with smart grouping
  • Excel Reporting: Generates detailed reports with multiple sheets
  • Rate Limiting: Built-in protection against API quota limits
  • Error Handling: Robust error handling and recovery mechanisms

๐Ÿ“Š What It Does

This tool analyzes competitor mentions across Vietnamese news websites by:

  1. Reading keywords, brands, and target websites from a CSV file
  2. Performing Google searches with multiple search patterns
  3. Collecting and analyzing search results
  4. Generating comprehensive Excel reports with detailed analytics

๐Ÿ› ๏ธ Prerequisites

  • Python 3.7+
  • Google Cloud Console account
  • Google Custom Search Engine setup

๐Ÿ“ฆ Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/google-search-competitor-analysis.git
    cd google-search-competitor-analysis
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up Google API credentials:

  4. Configure the script:

    • Open main.py or main_enhanced.py
    • Replace API_KEY with your Google API key
    • Replace SEARCH_ENGINE_ID with your Custom Search Engine ID

๐Ÿ“ Project Structure

chicom_crawl/
โ”œโ”€โ”€ main.py                    # Basic version of the scraper
โ”œโ”€โ”€ main_enhanced.py          # Enhanced version with multiple search patterns
โ”œโ”€โ”€ input.csv                 # Input data (keywords, brands, websites)
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ README.md                # This file
โ”œโ”€โ”€ LICENSE                  # MIT License
โ”œโ”€โ”€ .gitignore              # Git ignore file
โ”œโ”€โ”€ config.py               # Configuration management
โ”œโ”€โ”€ utils/                  # Utility functions
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ search_utils.py     # Search-related utilities
โ”‚   โ””โ”€โ”€ report_utils.py     # Report generation utilities
โ””โ”€โ”€ examples/               # Example files
    โ”œโ”€โ”€ sample_input.csv    # Sample input file
    โ””โ”€โ”€ sample_output.xlsx  # Sample output file

๐Ÿš€ Usage

Basic Usage

  1. Prepare your input CSV file (input.csv):

    keyword,brand,website
    "Dแบงu ฤƒn","Tฦฐแปng An",vnexpress.net
    "Dแบงu ฤƒn","Tฦฐแปng An",dantri.com.vn
    "Gia Vแป‹","Nam Ngฦฐ",vnexpress.net
  2. Run the basic version:

    python main.py
  3. Run the enhanced version (recommended):

    python main_enhanced.py

Configuration

Edit the configuration variables in the script:

# API Configuration
API_KEY = "your_api_key_here"
SEARCH_ENGINE_ID = "your_search_engine_id_here"

# Date Range
START_DATE = "2025-07-01"
END_DATE = "2025-07-31"

# File Names
INPUT_CSV_FILE = "input.csv"
OUTPUT_REPORT_FILE = "competitor_report.xlsx"

๐Ÿ“Š Output

The tool generates Excel reports with multiple sheets:

Basic Version (main.py)

  • Summary: Results count by brand and keyword
  • Raw Data: All search results with details

Enhanced Version (main_enhanced.py)

  • Enhanced Summary: Results breakdown by search pattern
  • Pattern Analysis: Effectiveness of each search strategy
  • Raw Data: Complete results with pattern information

โš ๏ธ Important Notes

API Limitations

  • Daily Quota: 10,000 free queries per day
  • Rate Limiting: 1-second delay between requests
  • API Discontinuation: Current API will be discontinued on January 8, 2025

Migration Required

The current Google Custom Search API will be discontinued. Plan to migrate to Google Vertex AI Search before the deadline.

๐Ÿ”ง Advanced Features

Enhanced Search Patterns

The enhanced version uses multiple search strategies:

  1. Full Combination: "keyword" "brand"
  2. Brand Only: "brand" (captures brand mentions)
  3. Brand with Alternatives: "brand" (related_terms)

Scalability

  • Smart Grouping: Reduces API calls by grouping similar searches
  • Efficient Processing: Handles 2000+ rows efficiently
  • Memory Optimized: Minimal memory usage for large datasets

๐Ÿ“ˆ Performance

Based on analysis with 2000+ row inputs:

  • API Calls: ~144 calls for 2000 rows (not 2000 calls!)
  • Execution Time: ~2.4 minutes for 2000 rows
  • Memory Usage: Minimal, handled efficiently by pandas

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

โš ๏ธ Disclaimer

This tool is for educational and research purposes. Please ensure compliance with:

  • Google's Terms of Service
  • Target websites' robots.txt and terms of use
  • Applicable data protection regulations

๐Ÿ†˜ Support

If you encounter issues:

  1. Check the Issues page
  2. Ensure your API credentials are correct
  3. Verify your input CSV format
  4. Check your daily API quota

๐Ÿ”ฎ Future Enhancements

  • Migrate to Google Vertex AI Search
  • Add progress tracking for large jobs
  • Implement checkpoint/resume functionality
  • Add result pagination for more complete data
  • Create web interface
  • Add automated scheduling
  • Implement cost tracking and optimization

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors