Skip to content

dawnxchoo/data-portfolio-handbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Dawn's Data Portfolio Project Handbook

How to build a Data Portfolio that gets you hired

Every resource & guidance you need to build your Data Science & Analytics portfolio

Are you building a Data Science & Analytics portfolio or already working on one, this repo is for you!

You'll find a ton resources to guide you from the initial stages of ideation all the way through to creating a polished, professional portfolio. This portfolio is what you can showcase to hiring managers, discuss in interviews, and highlight on your resume, so that you can ultimately land your dream Data Science or Data Analyst job!

If you're starting to build your portfolio from scratch I recommend following this guide step-by-step. But if you're here looking for specific resources or looking to up-level your data portfolio, feel free to jump to the relevant sections.

Data Portfolio Banner

Table of Contents

  1. Project planning & ideation
  2. A list of project ideas by role and skill level
  3. Skills to showcase for each role
  4. Tools & tech stack
  5. Hosting your portfolio

By the way, HI 👋 My name is Dawn Choo. I am a Data Scientist and a Data content creator. I write every day on LinkedIn and Instagram to 100k+ followers. I also have a newsletter (about Data careers) where I publish a new article every other Wednesday — sign up here at www.askdatadawn.com. Because I have written extensively about this topic across all of my platforms, you might find that I link to some of my posts & newsletters! My goal, however, is to make this repo as comprehensive as possible, so I include link from a wide range of resources.

Dawn Choo - Data Scientist and Content Creator

Project planning & ideation

A data "portfolio" typically contains multiple projects. Before we discuss creating the full portfolio, let's focus on building individual projects. This section provides instructions for finding datasets, developing project ideas, exploring sample projects, and following guided tutorials to help you get started.

Where to find datasets

Static datasets

Data APIs

How to come up with a project idea

I wrote about this in this LinkedIn post.

I also highly recomend writing an analysis plan before you start any anlaysis. I find that writing an analysis plan forces me to break down my high level question in bite-sized steps. And it's also helpful when I'm working through my analysis and going off on a tangent (my analysis usually focuses me back on what is important for this project)

Not sure how to write an analysis plan? You can use my analysis template to get started!

A list of project ideas by role and skill level

Guided project resources -- start here if you're building your first project

The ultimate goal would be to build completely self-driven projects, that are in your target industry. But if this is your first Data portfolio project, it could be helpful (read: less intimidating) to start with one of these guided projects!

Exploratory Data Analysis

Data Visualization

Machine Learning

ETL + Database Design

Datasets + project ideas

YouTube Content Strategy Optimization

Dataset: YouTube Performance + Comments Dataset

More details on these project ideas here

Beginner Identify what makes videos successful by analyzing top-performing keywords, like-to-view ratios, and average sentiment. Use an engagement score (views + likes + comments) to rank videos.

Intermediate Build a performance framework to evaluate content effectiveness over time. Use percentiles, cohort analysis, and sentiment-to-performance correlations to uncover patterns and inform future strategy.

Advanced

  • Engagement Quality Monitor: Detect negative sentiment spirals and flag videos needing moderation.
  • Community Health Scorer: Create health scores by content category to prioritize response.
  • Moderator Resource Optimizer: Allocate response efforts using sentiment and controversy levels.

Mental Health Treatment Gap

Dataset: Global Burden of Mental Disorders

More details on these project ideas here

Beginner Report on mental health treatment gaps globally. Highlight untreated anxiety rates, regional depression prevalence, and country-level mental health burdens using descriptive analysis.

Intermediate Track and predict treatment trends with window functions, cohort analysis, and a composite risk score to prioritize mental health intervention efforts.

Advanced

  • Real-Time Crisis Detection System: Build a system that flags potential mental health emergencies early.
  • Investment Optimization Platform: Model expected impact of different interventions across populations.
  • Equity Index Tool: Quantify and address disparities in mental health access and outcomes.

Delivery Performance & Customer Satisfaction Analysis

Dataset: Brazilian E-Commerce Public Dataset

More details on these project ideas here

Beginner Explore how delivery performance affects customer reviews. Compare on-time delivery rates, review scores, and regional differences in performance and volume.

Intermediate Build a delivery performance framework by analyzing seller rankings, customer retention by cohort, and root causes of operational delays. Include a "reliability score."

Advanced

  • Predictive Delivery Risk System: Train an ML model to forecast late deliveries and simulate interventions.
  • Dynamic Delivery Promise Engine: Personalize delivery estimates with A/B testing for effectiveness.
  • Seller Performance Platform: Automate monitoring, diagnostics, and action recommendations for sellers.

Fast Food Nutrition

Dataset: Fast Food Nutritional Facts

Beginner Analyze nutritional trends across popular fast food chains. Identify healthier vs. unhealthier options using basic statistics and visualizations.

Intermediate Explore nutrient relationships and build a dashboard to compare fast food items across chains based on user-defined goals.

Advanced

  • Clustering Analysis: Group menu items by nutritional profile to uncover patterns.
  • Nutritional Recommendation Engine: Suggest items based on goals like high protein or low sodium.
  • Interactive Explorer App: Build a user-facing tool for personalized fast food comparisons.

Airbnb Listings and Reviews

Dataset: Airbnb Listings + Reviews

Beginner Compare performance of superhosts vs. regular hosts. Analyze differences in ratings, prices, and booking frequency.

Intermediate Study geographic and amenity-based trends. Use clustering and mapping to explore pricing and satisfaction across locations.

Advanced

  • Price Optimization Model: Predict optimal listing prices using machine learning.
  • Listing Recommendation System: Suggest listings based on user preferences and review scores.
  • Amenity-Based Segmentation: Cluster listings to find patterns in offerings and outcomes.

Summer Olympics

Dataset: Olympic Athletes + Medals

Beginner Visualize medal counts and performance trends by country and year. Explore how participation has evolved.

Intermediate Analyze gender representation and dominant countries in specific sports over time. Build dashboards to highlight changes and patterns.

Advanced

  • Medal Count Predictor: Use machine learning to forecast 2024 medal totals.
  • Athlete Career Analysis: Explore patterns in longevity and versatility.
  • Demographics Enrichment: Merge Olympic data with external datasets to uncover deeper insights.

Movies

Dataset: Movies Metadata & Ratings

Beginner Explore genre trends, production countries, and language shifts. Create time-based visualizations.

Intermediate Perform sentiment analysis on movie descriptions and study how they relate to audience ratings.

Advanced

  • Box Office Success Predictor: Predict movie revenue based on metadata and features.
  • Movie Recommender System: Use collaborative filtering to suggest titles.
  • Genre Classifier: Build a model to predict genre using descriptive features.

Mental Health

Dataset: Global Mental Health Indicators

Beginner Track global prevalence of mental disorders and visualize country-level changes over time.

Intermediate Analyze gender disparities and model mental health outcomes using socioeconomic variables.

Advanced

  • Prevalence Prediction Model: Build and compare models for mental health forecasting.
  • Mental Health Impact Analysis: Correlate disorder rates with DALYs to quantify burden.
  • Country Clustering: Group nations by mental health profile using unsupervised learning.

More project ideas

Here are more project ideas below to tickle your braincells, I have not looked for the accompanying datasets. But hey, searching for these datasets is part of the fun (and part of the process!)

  • Analyze performance of each stage of the sales funnel, including lead generation, conversion rates.
  • Group delivery routes for logistics companies to optimize fuel usage and delivery time.
  • Build a text-based RPG game where a transformer model dynamically generates game scenarios.
  • Segment retail customers based on purchase behavior to tailor marketing strategies.
  • Predict housing prices based on location, size, and other property features.
  • Design a schema for teams, players, match scores to allow for analytics and leaderboards.
  • Develop model to classify customer reviews as positive, negative, or neutral.
  • Optimize warehouse layouts to minimize order-picking times using data analysis.
  • Model the relationship between advertising spending and product sales.
  • Showcase historical and upcoming space missions by country, mission type & success rates.
  • Build a dashboard to track trends in hashtags and sentiment over time.
  • Classify whether a patient has a specific disease based on health metrics.
  • Create a model that applies the style of famous paintings to user-uploaded photos.
  • Analyze supply chain efficiency by identifying bottlenecks in transportation routes.
  • Analyze whether a new drug improves recovery times compared to a placebo in a clinical trial.
  • Evaluate which marketing strategies lead to a higher conversion rates.
  • Forecast daily energy consumption for a city based on historical usage data and weather patterns.
  • Predict customer churn in a subscription-based service using demographic and usage data.
  • Compare open rates for two different email subject lines to determine which is more effective.
  • Develop customer profiles based on travel preferences and spending history to target personalized promotions.
  • Segment users by workout habits, goals, and subscription tiers for better engagement strategies.
  • Analyze and forecast stock price movements for a specific company using historical market data.
  • Create a real-time visualization of player performance metrics and match outcomes for a sports league.
  • Design a relational database to manage products, customers, orders & payments.
  • Simplify a dataset of university characteristics to uncover the main factors contributing to university rankings.
  • Uncover the factors influencing dropout rates in online learning platforms.
  • Predict monthly revenue for an e-commerce platform using seasonal and promotional trends.
  • Simplify data from healthcare studies to identify dominant factors associated with diseases.
  • Report on ticket resolution times, customer satisfaction scores, and support request volume trends.
  • Create a visualization of startup funding trends by industry, region, and funding stage over time.

Skills to showcase for each role

Every data role has it's own expectations and required skillset. Whether you're applying to Data Scientist or Data Analyst position, your portfolio needs to showcase skills that hiring managers are looking for. For this repo, we are only focused on Data Sience & Analytics roles, so we will not be covering other roles like Data Engineering or AI Engineering.

So for Data Science & Data Analyst roles, here are the required skills:

Product Data Science

  • Exploratory Data Analysis - e.g. customer churn analysis, user behavior segmentation
  • A/B Experimentation - e.g. feature launch impact test, email header experiment
  • Machine Learning - e.g. recommendation engine, customer lifetime value prediction
  • Causal Inference - e.g. marketing campaign attribution, feature adoption drivers

Machine Learning Data Science

  • Machine Learning - e.g. fraud detection system, image classification model
  • Data Preprocessing Project - e.g. text data pipeline, missing data imputation framework
  • Model Evaluation & Tuning - e.g. hyperparameter optimization study, cross-validation comparison
  • Big Data Technologies - e.g. spark ML pipeline, distributed training implementation

Data Analyst

  • SQL - e.g. complex joins dashboard, window functions for cohort analysis
  • Advanced Excel - e.g. financial modeling tool, interactive pivot table dashboard
  • Data Visualization - e.g. sales performance dashboard, customer journey visualization
  • Metric Definition & Reporting - e.g. KPI framework design, automated reporting system

Tools & tech stack

Of course, every product you build will require tools or platforms, like the actualy environment / program that you'll use when building your projects. We'll cover in this section what tools (most are free or at least have a free tier) you can use for your projects depending on the skills you're trying to showcase, plus I'll also add my personal recommendation and why.

SQL

  • DBeaver – Free, all-in-one SQL client. Works with Postgres, MySQL, SQLite, and more
  • MySQL Workbench – Official SQL IDE for MySQL
  • DB Browser for SQLite – Lightweight, no setup. Just open a file and start querying, but only good for small projects
  • Mode, Hex, PopSQL – Cloud SQL editors with free tiers options
  • BigQuery Console – Google's cloud SQL workspace

My recommendation: Start with DBeaver + SQLite if you're new and working with a very small dataset (<1000 rows & <20 columns). Move to DBeaver (since it works with all flavors of SQL) when you're ready to work with big & multiple datasets.

Python

  • Google Colab – Free, browser-based Jupyter notebooks with zero setup (also has Gemini AI integrated, which can be helpful)
  • JupyterLab – Runs locally on your computer, so you don't have to be connected to the internet
  • Cursor – AI-powered code editor, ideal for scripting, not notebooks.. but can make it work for analyses too
  • JetBrains DataSpell – Professional IDE with smart completions and notebook support, but can be very $$$

My recommendation: Use with Google Colab—it's free, no setup, and beginner-friendly.

Excel

  • Microsoft Excel (duh!) – The standard desktop version, but this does cost money AND doesn't work as well on Macbooks.
  • Excel for the Web – Free, browser-based Excel via Microsoft 365. But (be warned) this does not have all the functionality of the desktop version.
  • Google Sheets – Free and cloud-based with real-time collaboration. Mimics Excel very closely, but does not have all the functionality.

My recommendation: Use with Excel if you can afford to pay for it, if not, Google Sheets works great too.

Data visualization

  • Tableau Public – Free version of Tableau that lets you build interactive dashboards. But note: your dashboards must be public and functionality is limited
  • Tableau Desktop – Paid version ($$), lets you keep dashboards private and access more advanced features.
  • Power BI Desktop – Free Microsoft tool for building dashboards (only for Windows users)
  • Looker Studio (formerly Google Data Studio) – Free, browser-based, and integrates well with Sheets and BigQuery, but from my experience, very difficult yot use
  • Plotly / Seaborn / Matplotlib– Python libraries for data viz. \n Best if you're already working in Python and want to visualize inside a notebook.
  • Streamlit – Free, open-source Python framework that lets you turn scripts into interactive web apps. This is fun to play with if you're bulding public, production, web-apps based dashboards.

Hosting your portfolio

Finally, our last step is where to host your portfolio. There are so many great options out there, so how do you pick. Spoiler alert — I have a favorite and it's Notion (because it's free and easy-to-customize). We'll also cover some common mistakes people make when building their portfolio and how to best present your projects in interviews.

  • Notionmy top recommendation. It's easy to use and looks good. You can include text, charts, images, links, even embed code if needed.
  • DataSciencePortfol.io – Really easy to get your portfolio up and running in minutes. Use code DAWN20 for 20% of a PRO subscription!
  • Google Slides– Super intuitive, especially if you're already comfortable with Docs or Slides. But you're stuck working in a slide-by-slide format, which can feel limiting.
  • Carrd – I like that they have strater templates templates that make your portfolio look clean and modern with very little effort.

I put together this FREE Data Portfolio template that you can use to get started!

Guided portfolio programs

Looking for hands-on mentorship while building your portfolio? Check out Real World Analyst for step-by-step, personalized guidance!

Contact me

  • Follow me on InstagramI'm trying really hard to build out a presence on IG!
  • Follow me on LinkedIn
  • Check out my SQL Interview Preparation platform Interview Master

About

Every resource & guidance you need to build your Data Science & Analytics portfolio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages