Python Analytics โข Data Wrangling โข Visualization โข Statistics โข Machine Learning โข NLP โข Forecasting โข Deep Learning
A complete 57-lab hands-on Data Science portfolio built in Google Colab, progressing from Python and data analysis foundations to advanced visualization, statistical modeling, machine learning, natural language processing, time-series forecasting, and deep learning.
Simulates real-world data analysis, dashboarding, predictive modeling, experimentation, and notebook-based AI workflows across business, healthcare, transport, finance, education, cybersecurity, logistics, and computer vision use cases.
This repository demonstrates practical capability across:
- โ Python programming for analytical workflows
- โ Data wrangling with NumPy and pandas
- โ Data cleaning, preprocessing, and regex-based extraction
- โ Data visualization, dashboards, and storytelling
- โ Statistics, anomaly detection, and A/B testing
- โ Machine learning with scikit-learn
- โ NLP with NLTK, TF-IDF, spaCy, and topic modeling
- โ Time-series analysis and forecasting
- โ Deep learning with TensorFlow, Keras, and PyTorch
This is not just theory-based notebook content.
Every lab includes:
- Notebook-based implementation
- Structured lab documentation
- Interview Q&A
- Troubleshooting notes
- Practical datasets and applied use cases
- Portfolio-ready organization
The portfolio reflects a full progression from Python foundations โ data analysis โ machine learning โ NLP โ forecasting โ deep learning.
A structured 57-lab Data Science with Python portfolio built in Google Colab, organized into 8 section-wise folders covering the full learning path from foundational Python to advanced AI workflows.
This repository is designed to showcase:
- Real notebook-based analytical workflows
- Practical data cleaning and transformation
- Visualization and dashboard-style storytelling
- Statistical reasoning for decision-making
- Applied machine learning and model evaluation
- NLP pipelines for text analysis
- Forecasting for time-based datasets
- Deep learning for image and sequence problems
All labs are organized in a consistent, portfolio-friendly format for both learning review and professional presentation.
Each lab is execution-focused and includes:
README.md.ipynbnotebookinterview_qna.mdtroubleshooting.md
- Aspiring Data Analysts building hands-on project depth
- Beginner to intermediate Data Science learners
- Machine Learning learners building structured portfolio work
- Students moving from Python basics to applied analytics and AI
- Job seekers preparing for data, analytics, ML, or AI interviews
- Recruiters and hiring managers reviewing practical notebook-based work
This repository is especially useful for anyone who wants to show a clear, progressive, real-work-style Data Science learning journey instead of isolated practice notebooks.
Click any lab title to open its folder.
| Lab | Title | Focus Area |
|---|---|---|
| 01 | Python Syntax and Data Types | Variables, types, operators, core syntax |
| 02 | Control Flow in Python | Conditions, branching, logical flow |
| 03 | Looping Through Data | Iteration patterns and loop-based processing |
| 04 | Functions and Reusability | Function design and reusable logic |
| 05 | List and Dictionary Comprehensions | Compact transformations and filtering |
| 06 | File Handling with TXT and CSV | Reading, writing, parsing local files |
| 07 | Working with JSON Files | JSON parsing and structured data handling |
| 08 | Exception Handling and Logging | Defensive coding and runtime debugging |
| 09 | Automate Weather Data Retrieval with API | API calls, JSON responses, automation |
| 10 | Build a Command-Line CSV Parser | Script-based CSV inspection and parsing |
- Python fundamentals for data workflows
- File and JSON handling
- API consumption and automation basics
- Exception handling and logging
- Reusable scripting patterns
| Lab | Title | Focus Area |
|---|---|---|
| 11 | NumPy Arrays and Vector Operations | Arrays, broadcasting, vectorization |
| 12 | Getting Started with Pandas DataFrames | DataFrame creation and basic operations |
| 13 | Filtering, Sorting, and Merging DataFrames | Querying and combining tabular data |
| 14 | Aggregation with GroupBy and Apply | Summaries, aggregation, transformation |
| 15 | Data Cleaning and Preprocessing | Standard cleanup and normalization workflows |
| 16 | Handling Missing Values and Outliers | Imputation and anomaly-aware preprocessing |
| 17 | Extract Patterns Using Regular Expressions | Regex-based extraction and text cleanup |
| 18 | Clean and Analyze EMR Patient Data | Healthcare data cleaning and exploratory analysis |
| 19 | Analyze Transportation Delay Dataset | Delay analysis and practical tabular exploration |
| 20 | Build a Custom Data Cleaning Pipeline | Reusable preprocessing pipeline design |
- NumPy and pandas workflows
- Cleaning messy real-world datasets
- Missing value and outlier treatment
- Regex extraction for structured analytics
- Reusable preprocessing design
| Lab | Title | Focus Area |
|---|---|---|
| 21 | Static Charts with Matplotlib | Core plotting and chart composition |
| 22 | Statistical Visuals with Seaborn | Distribution and relationship visualizations |
| 23 | Enhancing Plots with Annotations and Themes | Design polish and communication clarity |
| 24 | Interactive Plots with Plotly | Interactive analytics and exploratory visuals |
| 25 | Build a COVID Trend Dashboard with Plotly | Dashboard-driven public trend analysis |
| 26 | Interactive Visuals with Bokeh | Interactive chart controls and presentation |
| 27 | Web Dashboards with Dash | Analytical app-style dashboard development |
| 28 | Rapid Dashboards with Streamlit | Fast deployment of data apps |
| 29 | Interactive Crime Map with Folium and GeoJSON | Geospatial storytelling and mapping |
| 30 | Jupyter for Narrative Visualization | Notebook-based storytelling and explanation |
- Static and interactive data visualization
- Dashboard design and analytical storytelling
- Geospatial exploration with Folium
- Annotation and theme design
- Notebook communication for technical audiences
| Lab | Title | Focus Area |
|---|---|---|
| 31 | Descriptive Statistics with Pandas | Summary statistics and distribution understanding |
| 32 | Detecting Anomalies in Transaction Logs | Outlier detection in financial-style data |
| 33 | Fraud Probability Analysis with Logistic Scoring | Scoring-based fraud risk estimation |
| 34 | A/B Testing Basics for eCommerce | Experiment design and conversion analysis |
| 35 | Statistical Significance in A/B Testing | Hypothesis testing and significance interpretation |
- Descriptive and inferential statistics
- A/B testing workflows
- Anomaly detection logic
- Fraud-oriented scoring interpretation
- Statistical decision support
| Lab | Title | Focus Area |
|---|---|---|
| 36 | Regression and Classification with scikit-learn | Supervised learning foundations |
| 37 | Feature Engineering and Cross-Validation | Feature quality and robust validation |
| 38 | Clustering and Dimensionality Reduction | Unsupervised learning and feature compression |
| 39 | Build a Retail Recommendation System | Recommendation logic and retail analytics |
| 40 | Evaluate and Compare ML Models | Performance evaluation and model selection |
- Supervised and unsupervised ML
- Feature engineering workflows
- Cross-validation and model comparison
- Recommendation system concepts
- Practical scikit-learn implementation
| Lab | Title | Focus Area |
|---|---|---|
| 41 | Text Cleaning and Preprocessing with NLTK | Tokenization, cleanup, normalization |
| 42 | Feature Extraction with BoW and TF-IDF | Vectorization and text features |
| 43 | Sentiment Analysis on EdTech Feedback | Opinion mining and educational feedback analysis |
| 44 | Named Entity Recognition for Cybersecurity Logs | Entity extraction from security-oriented text |
| 45 | Topic Modeling and Document Classification | Topics, themes, and text categorization |
- NLP preprocessing pipelines
- Bag-of-Words and TF-IDF workflows
- Sentiment analysis
- Named entity recognition
- Topic modeling and document classification
| Lab | Title | Focus Area |
|---|---|---|
| 46 | Time Series Decomposition and Trend Analysis | Trend, seasonality, decomposition |
| 47 | Moving Averages and Smoothing Techniques | Smoothing and signal stabilization |
| 48 | Forecasting with ARIMA and SARIMA | Classical forecasting model workflows |
| 49 | Business Forecasting with Prophet | Practical forecasting for business scenarios |
| 50 | Predictive Forecasting for Logistics and Finance | Applied forecasting in operational environments |
- Time-aware analytical thinking
- Trend and seasonality analysis
- Forecasting with ARIMA, SARIMA, and Prophet
- Business and logistics forecasting
- Model interpretation for temporal data
| Lab | Title | Focus Area |
|---|---|---|
| 51 | Build Your First Neural Network | Neural network fundamentals |
| 52 | Image Classification with Convolutional Neural Networks (CNNs) | CNN-based image classification |
| 53 | Medical Image Classification (X-ray / CT) | Healthcare imaging workflows |
| 54 | Sequence Modeling with RNNs and LSTMs for Cybersecurity Anomaly Detection | Sequence models and anomaly detection |
| 55 | Analyze Drone Footage with CNNs | Vision pipelines for aerial imagery |
| 56 | Apply Dropout and Batch Normalization | Regularization and training stability |
| 57 | Transfer Learning for Custom Image Classification | Pretrained models and custom datasets |
- Deep learning fundamentals
- CNN-based vision workflows
- Sequence modeling with RNNs/LSTMs
- Medical and aerial image analysis
- Regularization and transfer learning
These labs add strong portfolio value because they reflect varied, applied, real-world data science work:
- ๐ฆ๏ธ Weather Data Retrieval with API
- ๐ Command-Line CSV Parsing
- ๐ฅ EMR Patient Data Cleaning and Analysis
- ๐ Transportation Delay Dataset Analysis
- ๐งผ Custom Reusable Data-Cleaning Pipeline
- ๐ COVID Trend Dashboard with Plotly
- ๐บ๏ธ Crime Mapping with Folium and GeoJSON
- ๐จ Transaction Anomaly Detection
- ๐ณ Fraud Probability Scoring
- ๐งช A/B Testing for eCommerce
- ๐๏ธ Retail Recommendation System
- ๐ฌ Sentiment Analysis on EdTech Feedback
- ๐ก๏ธ NER for Cybersecurity Logs
- ๐ Logistics and Finance Forecasting
- ๐ฉป Medical Image Classification
- ๐ Sequence Modeling for Anomaly Detection
- ๐ Drone Footage Analysis with CNNs
- ๐ง Transfer Learning for Custom Image Classification
- Python programming for data science
- Notebook-based analytical workflows
- Data cleaning and preprocessing
- Exploratory data analysis
- Array and DataFrame operations
- Regex-based pattern extraction
- Static and interactive data visualization
- Dashboard design and storytelling
- Statistical reasoning and hypothesis testing
- Classical machine learning with scikit-learn
- NLP preprocessing and feature extraction
- Time-series forecasting
- Deep learning model building
- Model evaluation and comparison
- Applied analytics across healthcare, transport, business, education, cybersecurity, logistics, finance, and computer vision domains
Click to expand technical stack
- Google Colab
- Jupyter-style notebook workflow
- Python 3.x
ossysjsoncsvredatetimeloggingpathlibwarningsrequestspickleglobcollectionsrandomtime
- NumPy
- pandas
- Matplotlib
- Seaborn
- Plotly
- Bokeh
- Dash
- Streamlit
- Folium
- GeoJSON workflows
- WordCloud
- SciPy
- statsmodels
- scikit-learn
- joblib
- NLTK
- spaCy
- TF-IDF / Bag-of-Words workflows
- gensim
- pyLDAvis
- statsmodels
- Prophet
- TensorFlow
- Keras
- PyTorch
- torchvision
- OpenCV (
cv2) - PIL
- IPython
- pprint
- tabulate
Data-Science-With-Python/
โโโ ๐น Section 1 โ Python Foundations for Data Science (Labs 01โ10) # Python syntax, control flow, files, JSON, APIs, automation (Labs 01โ10)
โโโ ๐น Section 2 โ Working with Data: Pandas & NumPy (Labs 11โ20) # DataFrames, arrays, cleaning, preprocessing, regex, pipelines (Labs 11โ20)
โโโ ๐น Section 3 โ Data Visualization & Storytelling (Labs 21โ30) # Charts, dashboards, geospatial visuals, narrative analytics (Labs 21โ30)
โโโ ๐น Section 4 โ Statistics & Probability for Data Science (Labs 31โ35) # Descriptive stats, anomaly detection, fraud scoring, A/B testing (Labs 31โ35)
โโโ ๐น Section 5 โ Machine Learning with scikit-learn (Labs 36โ40) # Regression, classification, clustering, recommendation, evaluation (Labs 36โ40)
โโโ ๐น Section 6 โ Applied NLP (Labs 41โ45) # Text preprocessing, TF-IDF, sentiment, NER, topic modeling (Labs 41โ45)
โโโ ๐น Section 7 โ Time Series & Forecasting (Labs 46โ50) # Trend analysis, smoothing, ARIMA/SARIMA, Prophet forecasting (Labs 46โ50)
โโโ ๐น Section 8 โ Deep Learning with TensorFlow & PyTorch (Labs 51โ57) # Neural networks, CNNs, sequence models, transfer learning (Labs 51โ57)
โโโ README.md # Main portfolio README, section index, featured labs, repo overview
โโโ .gitignore # Ignore notebook checkpoints, cache files, temp artifacts
Each lab follows a consistent, portfolio-friendly structure:
labXX-topic-name/
โโโ README.md # Lab overview, objectives, concepts, workflow, outcomes
โโโ labXX_topic_name.ipynb # Main Google Colab notebook with code, outputs, plots, and explanations
โโโ interview_qna.md # Interview-focused questions and answers for revision
โโโ troubleshooting.md # Common issues, fixes, execution notes, and debugging tips
This structure keeps each lab:
- easy to navigate
- notebook-first and review friendly
- consistent across all 57 labs
- portfolio ready for GitHub presentation
- useful for both learning review and interview preparation
After completing these 57 labs, this portfolio demonstrates the ability to:
- Build Python-based analytical workflows from scratch
- Clean, transform, and structure raw datasets for analysis
- Perform exploratory data analysis and extract practical insights
- Create charts, dashboards, and narrative visualizations
- Apply statistics, anomaly detection, and A/B testing logic
- Train, evaluate, and compare machine learning models
- Build NLP pipelines for text cleaning, sentiment, NER, and topic analysis
- Develop forecasting workflows for time-based datasets
- Use deep learning for image and sequence-based problems
- Document technical work clearly in a portfolio-ready format
These labs reflect practical data-science work such as:
- Cleaning messy business, operational, and healthcare datasets
- Building dashboards and visual reports for decision-making
- Detecting anomalies in financial and transactional records
- Evaluating product or business changes with A/B testing
- Comparing machine learning models for predictive tasks
- Extracting insight from text feedback and security-style logs
- Forecasting trends across finance, logistics, and operational data
- Applying deep learning to image and sequence problems
This portfolio is built around applied analytical workflows, not isolated theory-based notebooks.
This portfolio reflects:
- Practical Data Science and Analytics capability
- Strong foundation in data cleaning, analysis, and visualization
- Applied Machine Learning and model evaluation skills
- Working knowledge of NLP, forecasting, and deep learning
- Notebook-first experimentation and reproducible workflow discipline
- Structured technical documentation and interview readiness
It aligns well with roles in:
- Data Analytics
- Data Science
- Machine Learning
- Applied AI
- Research and experimentation-focused analytics workflows
All labs were executed in a Google Colab notebook environment and designed to simulate realistic data analysis, modeling, and AI workflow execution, including:
- Data cleaning and preprocessing pipelines for messy real-world datasets
- Exploratory and visual analytics for business and operational insight generation
- Statistical testing workflows for anomaly detection, fraud scoring, and A/B testing
- Machine learning model development for classification, regression, clustering, and recommendation tasks
- NLP pipelines for sentiment analysis, named entity recognition, and topic modeling
- Forecasting workflows for temporal data in finance, logistics, and business trend analysis
- Deep learning implementations for image classification, sequence modeling, and transfer learning
- Structured notebook documentation combining code, outputs, plots, interpretation, and troubleshooting
This is practical implementation โ not just theoretical notebook practice.
This heatmap reflects hands-on implementation across 57 labs in:
Python โข Data Wrangling โข Visualization โข Statistics โข Machine Learning โข NLP โข Forecasting โข Deep Learning
Exposure bars use the portfolio style format you wanted.
| Skill Area | Exposure Level | Practical Depth | Tools / Frameworks Used |
|---|---|---|---|
| ๐ Python for Data Workflows | โโโโโโโโโโ 100% |
Core scripting, functions, file handling, JSON, APIs, exception handling | Python, JSON, CSV, requests, logging |
| ๐งน Data Wrangling & Cleaning | โโโโโโโโโโ 100% |
Cleaning pipelines, missing values, outliers, preprocessing, regex extraction | pandas, NumPy, re |
| ๐ Exploratory Data Analysis | โโโโโโโโโโ 90% |
Trend analysis, summaries, tabular exploration, domain-focused interpretation | pandas, NumPy, Google Colab |
| ๐จ Visualization & Dashboarding | โโโโโโโโโโ 100% |
Static charts, interactive plots, dashboards, geospatial storytelling | Matplotlib, Seaborn, Plotly, Bokeh, Dash, Streamlit, Folium |
| ๐งฎ Statistics & Probability | โโโโโโโโโโ 80% |
Descriptive stats, anomaly analysis, fraud scoring, A/B testing, significance testing | SciPy, statsmodels, pandas |
| ๐ค Supervised Machine Learning | โโโโโโโโโโ 90% |
Regression, classification, feature preparation, model comparison | scikit-learn, pandas, NumPy |
| ๐งช Feature Engineering & Model Evaluation | โโโโโโโโโโ 90% |
Cross-validation, feature selection, performance comparison, recommendation logic | scikit-learn, joblib |
| ๐ Natural Language Processing | โโโโโโโโโโ 80% |
Text cleaning, vectorization, sentiment analysis, NER, topic modeling | NLTK, spaCy, TF-IDF, gensim, pyLDAvis |
| โณ Time Series & Forecasting | โโโโโโโโโโ 80% |
Trend decomposition, smoothing, ARIMA/SARIMA, Prophet forecasting | statsmodels, Prophet |
| ๐ง Deep Learning Fundamentals | โโโโโโโโโโ 80% |
Neural networks, training workflows, regularization, architecture understanding | TensorFlow, Keras, PyTorch |
| ๐ผ๏ธ Computer Vision & Image Modeling | โโโโโโโโโโ 80% |
CNN classification, medical imaging, drone imagery, transfer learning | TensorFlow, PyTorch, OpenCV, PIL |
| ๐ Notebook Documentation & Portfolio Presentation | โโโโโโโโโโ 100% |
Lab writeups, structured notebooks, interview Q&A, troubleshooting documentation | Google Colab, Markdown |
โโโโโโโโโโ= Implemented end-to-end with strong practical coverageโโโโโโโโโโ= Advanced practical implementation across multiple labsโโโโโโโโโโ= Strong working implementation with applied contextโโโโโโโโโโ= Foundational to intermediate applied exposure
This heatmap reflects portfolio-level practical capability, not isolated notebook experiments โ covering:
Python โ Cleaning โ Analysis โ Visualization โ Statistics โ ML โ NLP โ Forecasting โ Deep Learning
git clone https://github.com/your-username/Data-Science-With-Python.git
cd Data-Science-With-Python
# Open any section
cd 01-python-foundations-for-data-science
# Open any lab
cd lab01-python-syntax-and-data-typesThen review the lab in this order:
- Open
README.mdfor the lab overview, objectives, and workflow - Run the
.ipynbnotebook in Google Colab - Use
interview_qna.mdfor revision and interview preparation - Check
troubleshooting.mdfor common issues, fixes, and execution notes
Each lab is self-contained and includes notebook implementation, lab documentation, interview preparation, and troubleshooting support.
All labs in this repository were executed in a Google Colab notebook environment designed for practical, reproducible Data Science and Machine Learning workflows.
Environment characteristics:
- Google Colab + Python 3.x notebook-based execution
- Cloud-first workflow for reproducible experimentation and easy review
- Structured datasets and applied use cases across business, healthcare, transport, finance, education, cybersecurity, logistics, and vision tasks
- Notebook-driven implementation combining code, outputs, plots, and written explanation
- Progressive lab design covering Python, data analysis, visualization, statistics, ML, NLP, forecasting, and deep learning
- Portfolio-oriented documentation with notebook, README, interview Q&A, and troubleshooting notes
Outputs were validated through notebook execution, analytical interpretation, plots, model results, and structured documentation.
This repository is designed to support:
- Data Science learning and portfolio development
- Python-based analytical workflow building
- Data cleaning, preprocessing, and exploratory analysis
- Visualization, dashboarding, and storytelling practice
- Machine Learning and model evaluation workflows
- NLP, forecasting, and deep learning implementation
- Interview preparation through practical, documented labs
All notebooks, workflows, and documentation are intended for educational use, professional portfolio presentation, and applied skill development.
Use this repository as a structured progression from foundational analytics to advanced AI workflows.
All datasets, experiments, and analytical workflows in this repository were used:
- In controlled educational and portfolio-building contexts
- For learning, experimentation, and technical development
- Using practice datasets, public-style scenarios, or lab-oriented analytical exercises
- For responsible demonstration of data, analytics, ML, NLP, forecasting, and deep learning skills
This repository is intended to showcase:
- Practical notebook-based implementation
- Clear technical documentation
- Analytical reasoning and model-building ability
- Structured, professional portfolio presentation
It is provided solely for educational, professional development, and portfolio purposes.
This repository follows a natural progression:
Python foundations
โ Data handling and cleaning
โ Visualization and dashboards
โ Statistical reasoning
โ Machine learning workflows
โ NLP pipelines
โ Forecasting and time-aware analysis
โ Deep learning for image and sequence problems
That progression makes the repo stronger as a portfolio because it shows a deliberate build-up of practical skill, not just disconnected experimentation.
This repository represents a structured 57-lab build journey across the data science lifecycle:
Python โ Data Wrangling โ Visualization โ Statistics โ Machine Learning โ NLP โ Forecasting โ Deep Learning
It reflects hands-on implementation, not just theory.
If this repository helps you, consider starring it.
Abdul Rehman
Data Science โข Analytics โข Machine Learning โข NLP โข Forecasting โข Deep Learning