This project analyzes 9,000+ Data Science job postings across India collected from platforms like LinkedIn and Indeed using a custom Python-based web scraper. The goal was to understand what skills, locations, industries, and job roles dominate the Indian job market, and what qualifications employers expect in 2025.
Using Python, Excel Pivot Tables, Power Query, and Power BI, the dataset was cleaned, transformed, modeled, and visualized into a single-page interactive dashboard that provides valuable insights for job seekers, career changers, and workforce planners.
- Total Jobs: 9K+
- Unique Companies Hiring: 3K+
- Unique Skills Identified: 229
- Top Skills: Communication, Python, SQL, AWS, Leadership
- Top Cities Hiring: Bengaluru, Hyderabad, Pune
- Top Industries Hiring: Technology, Consulting, Finance
- Most Required Education: Masters & Bachelors
- Experience Expectation: 3β7 years for most roles
The Data Science job landscape evolves quickly β new tools emerge, cloud skills are demanded, and hiring hotspots shift. Candidates often struggle with:
- Which skills should I learn first?
- What cities/states have the most job opportunities?
- What job roles dominate the Indian market?
- What experience and education do companies expect?
- Which industries hire the most Data professionals?
βWhat does the Data Science job market in India look like in 2025, and how can job seekers align themselves with market demand?β
A custom scraper (job_scraper.py) was developed using the JobSpy Docker API to extract thousands of job postings.
- Job title
- Company
- Description
- City & State
- Skills
- Education & Experience requirements
- Seniority & job type
- Posted date
- Industry mapping
Raw output stored as:
data/jobs_raw.csv
Performed in EDA.ipynb:
- Removed duplicates
- Standardized job titles, cities, states
- Extracted skills from text
- Cleaned education & experience columns
- Derived new fields (role category, skill count, etc.)
Exported cleaned dataset:
data/jobs_cleaned.csv
Before building the Power BI dashboard, Excel Pivot Tables were used for validation, exploration, and generating dimension tables.
| Pivot Table | Purpose |
|---|---|
| Top States Hiring | Count jobs per state |
| Top Cities Hiring | Identify major hiring hotspots |
| Top Companies | Companies with the highest hiring volume |
| Top Skills | Skill frequency across all postings |
| Job Roles Distribution | Count of Data Analyst, ML Engineer, etc. |
| Education Requirement | Masters vs Bachelors vs PhD |
| Experience Required | 0β12 years distribution |
- Quick exploratory analysis
- Fast validation before BI modeling
- Easy export of Top 10 datasets
- Served as dimension tables in Power BI
All pivot tables were saved inside:
ds-jobs-analysis.xlsx
A clean star schema was designed with:
jobs(1 row per job posting)
skillsjobs_skills(bridge table for many-to-many relationships)companiescitiesstatejob_roleseducationexperienceindustries
The final dashboard includes:
- Total Jobs
- Unique Companies
- Unique Skills
- Top Skills in Demand
- Top States & Cities Hiring
- Job Roles Distribution
- Education & Experience Requirements
- Top Hiring Companies
- Job Role
- Seniority
- State
- Skills
- Education
- Web Scraping
- Regex-based skill extraction
- Cleansing & preprocessing
- Pivot Tables
- Data aggregation
- Data validation
- Slicer-based filtering
- Data Modeling
- DAX measures
- Top-N ranking
- Relationships & bridge table handling
- KPI + interactive visual design
π Communication is the most demanded skill β soft skills matter. π Python, SQL, Machine Learning, AWS remain core technical requirements. π Bengaluru, Hyderabad, Pune dominate Indiaβs DS job market. π Technology & Consulting are the largest hiring industries. π Mid-level experience (3β7 years) is most commonly required. π Masters degree still preferred for senior roles.
- Prioritize Python + SQL + ML + Cloud (AWS/Azure)
- Build projects that demonstrate end-to-end ML workflows
- Improve communication & storytelling skills
- Target job applications in Bengaluru, Hyderabad, Pune
- Consider pursuing Masters if aiming for senior roles
- Use insights to refine job posting standards
- Improve clarity in skill requirements
- Benchmark hiring trends against industry leaders
data-science-jobs-analysis/
βββ data/
β βββ jobs_raw.csv
β βββ jobs_cleaned.csv
β
βββ notebook/
β βββ EDA.ipynb
β
βββ scraper/
β βββ job_scraper.py
β βββ docker-compose.yml
β βββ docker-image-starter-cmd
β
βββ ds-jobs-analysis.xlsx # Excel Pivot Tables
βββ data-science-jobs-analytics.pbix # Power BI Dashboard
βββ frontend/ # Optional UI
βββ requirements.txt
βββ README.md
Here are potential enhancements:
- Add job trend forecasting β Prophet or ARIMA
- Perform NLP on job descriptions β Topic modeling / keyword cloud
- Build a search engine for job filtering using Streamlit
- Automate daily scraping with cron + GitHub Actions
- Deploy dashboard publicly using Power BI service
