A Next.js application that scrapes demographic and economic data for U.S. cities from city-data.com and the Bureau of Labor Statistics. This version is designed for serverless deployments and uses MongoDB for data storage. Scraping code is based on haveitjoewei/market-research.
- Select states for analysis
- Set minimum population threshold for cities to include
- Scrape data from city-data.com including population, income, housing, and more
- Fetch job data from BLS
- Generate Excel reports for easy analysis
- Download generated files
- Node.js 18 or later
- A MongoDB Atlas account (free tier works fine)
- A Google Maps API key with Geocoding API enabled
- Internet access to scrape data from city-data.com and BLS
- Clone this repository:
git clone <repository-url>
cd market-research-app- Install dependencies:
npm install-
Set up MongoDB:
- Copy the
.env.templatefile to.env.local - Follow the instructions in MongoDB Setup Guide
- Add your MongoDB connection string to the
.env.localfile:MONGODB_URI=mongodb+srv://username:password@cluster0.xxxxx.mongodb.net/market-research?retryWrites=true&w=majority
- Copy the
-
Start the development server:
npm run dev- Open http://localhost:3000 in your browser
- Push your code to a GitHub repository
- Connect your repository to Vercel
- Add the
MONGODB_URIenvironment variable in the Vercel dashboard - Deploy your application
- Enter your Google Maps API key
- Set the minimum population threshold for cities (default: 50,000)
- Select one or more states to analyze
- Click "Generate Market Research"
- Wait for the scraping to complete
- Download the generated files from the "Generated Files" section
This application is built with:
- Next.js: React framework with API routes
- MongoDB: Database for storing scraped data and generated files
- Cheerio: For HTML parsing and web scraping
- ExcelJS: For generating Excel reports
- Node-Geocoder: For geocoding cities with Google Maps API
- User Input: The user selects states and sets parameters
- API Routes: Serverless functions handle the scraping requests
- Data Scraping: The app scrapes city-data.com and BLS websites
- MongoDB Storage: All data is stored in MongoDB collections
- File Generation: Excel reports are generated and stored in MongoDB
- Download: Users can download files directly from the app
- Execution timeout (usually 10-60 seconds, but can be adjusted depending on the provider)
- Memory limits depending on the provider
- If you're analyzing many states, the process might time out
To handle larger workloads:
- Process one state at a time
- Implement a queue system for background processing
- Consider using dedicated servers for intensive scraping
Be aware of rate limits on:
- city-data.com
- Bureau of Labor Statistics
- Google Maps Geocoding API
Add delays between requests to avoid being blocked.
- Web scraping is subject to changes in website structures. If city-data.com or BLS changes their website layout, the scraper may need to be updated.
- Be respectful of the websites being scraped. Add delays between requests and don't overload their servers.
- This tool is for educational and research purposes only.
MIT License