A Python-based web scraping tool that performs comprehensive Google search analysis for competitor monitoring across multiple websites and brands.
- Multi-pattern Search: Uses multiple search strategies for comprehensive coverage
- Date Range Filtering: Search within specific date ranges
- Batch Processing: Efficiently handles large CSV inputs with smart grouping
- Excel Reporting: Generates detailed reports with multiple sheets
- Rate Limiting: Built-in protection against API quota limits
- Error Handling: Robust error handling and recovery mechanisms
This tool analyzes competitor mentions across Vietnamese news websites by:
- Reading keywords, brands, and target websites from a CSV file
- Performing Google searches with multiple search patterns
- Collecting and analyzing search results
- Generating comprehensive Excel reports with detailed analytics
- Python 3.7+
- Google Cloud Console account
- Google Custom Search Engine setup
-
Clone the repository:
git clone https://github.com/yourusername/google-search-competitor-analysis.git cd google-search-competitor-analysis -
Install dependencies:
pip install -r requirements.txt
-
Set up Google API credentials:
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable the Custom Search API
- Create API credentials (API Key)
- Set up a Custom Search Engine at Programmable Search Engine
-
Configure the script:
- Open
main.pyormain_enhanced.py - Replace
API_KEYwith your Google API key - Replace
SEARCH_ENGINE_IDwith your Custom Search Engine ID
- Open
chicom_crawl/
โโโ main.py # Basic version of the scraper
โโโ main_enhanced.py # Enhanced version with multiple search patterns
โโโ input.csv # Input data (keywords, brands, websites)
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโ LICENSE # MIT License
โโโ .gitignore # Git ignore file
โโโ config.py # Configuration management
โโโ utils/ # Utility functions
โ โโโ __init__.py
โ โโโ search_utils.py # Search-related utilities
โ โโโ report_utils.py # Report generation utilities
โโโ examples/ # Example files
โโโ sample_input.csv # Sample input file
โโโ sample_output.xlsx # Sample output file
-
Prepare your input CSV file (
input.csv):keyword,brand,website "Dแบงu ฤn","Tฦฐแปng An",vnexpress.net "Dแบงu ฤn","Tฦฐแปng An",dantri.com.vn "Gia Vแป","Nam Ngฦฐ",vnexpress.net
-
Run the basic version:
python main.py
-
Run the enhanced version (recommended):
python main_enhanced.py
Edit the configuration variables in the script:
# API Configuration
API_KEY = "your_api_key_here"
SEARCH_ENGINE_ID = "your_search_engine_id_here"
# Date Range
START_DATE = "2025-07-01"
END_DATE = "2025-07-31"
# File Names
INPUT_CSV_FILE = "input.csv"
OUTPUT_REPORT_FILE = "competitor_report.xlsx"The tool generates Excel reports with multiple sheets:
- Summary: Results count by brand and keyword
- Raw Data: All search results with details
- Enhanced Summary: Results breakdown by search pattern
- Pattern Analysis: Effectiveness of each search strategy
- Raw Data: Complete results with pattern information
- Daily Quota: 10,000 free queries per day
- Rate Limiting: 1-second delay between requests
- API Discontinuation: Current API will be discontinued on January 8, 2025
The current Google Custom Search API will be discontinued. Plan to migrate to Google Vertex AI Search before the deadline.
The enhanced version uses multiple search strategies:
- Full Combination:
"keyword" "brand" - Brand Only:
"brand"(captures brand mentions) - Brand with Alternatives:
"brand" (related_terms)
- Smart Grouping: Reduces API calls by grouping similar searches
- Efficient Processing: Handles 2000+ rows efficiently
- Memory Optimized: Minimal memory usage for large datasets
Based on analysis with 2000+ row inputs:
- API Calls: ~144 calls for 2000 rows (not 2000 calls!)
- Execution Time: ~2.4 minutes for 2000 rows
- Memory Usage: Minimal, handled efficiently by pandas
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is for educational and research purposes. Please ensure compliance with:
- Google's Terms of Service
- Target websites' robots.txt and terms of use
- Applicable data protection regulations
If you encounter issues:
- Check the Issues page
- Ensure your API credentials are correct
- Verify your input CSV format
- Check your daily API quota
- Migrate to Google Vertex AI Search
- Add progress tracking for large jobs
- Implement checkpoint/resume functionality
- Add result pagination for more complete data
- Create web interface
- Add automated scheduling
- Implement cost tracking and optimization