-
-
Notifications
You must be signed in to change notification settings - Fork 20
Data Visualization
Data visualizations help look at the data in a form that's easier to understand. After all, it's much easier to digest thousands of data points visually as opposed to written in a spreadsheet. Not only are visualizations used during EDA to explore and understand the data, but also continuously throughout the data analysis process. They're also an easy way to convey concepts and results.
This tutorial assumes that you have basic understanding of working in Python for the Pandas, [Seaborn] (https://github.com/hackforla/data-science/wiki/Data-Visualization#Seaborn), and [Matplotlib] (https://github.com/hackforla/data-science/wiki/Data-Visualization#Matplotlib) sections.
Pandas is the workhorse of Python data analysis. Its dataframe data structure makes available a huge variety of tools. In addition, Pandas is supported by a great variety of packages in Python for specialized data analysis and machine learning, which makes it a valuable core competency.
- Official Pandas Tutorial: Up to date and well maintained tutorial focused on getting you up to speed and running quickly
- Daniel Chen Pandas Tutorial: Good in-depth video walkthrough showing a full data analysis with explanations
- Brandon Rhodes Pandas Tutorial: Considered by many people the definitive intro to pandas. Be aware that some small changes have happened to the way pandas works since this was filmed, so you may need to google if the code examples don't work exactly as shown.
[Matplotlib] (https://matplotlib.org/) is a common visualization library used in Python for static, animated, and interactive visualizations.
Seaborn is another Python visualization library. It's built on top of Matplotlib and integrated with the Pandas dataframe structures.
- Official User Guide and Seaborn Tutorial: Well maintained tutorial focused on exploring the capabilities of Seaborn.
- [Elite Data Science Tutorial on Seaborn]: Tutorial guide with sample code and dataset, including images of the visualizations, with a general overview on what graph visualizations are useful for which purposes.
Tableau is separate data visualization software tool that makes it easy for anyone to organize data and create interactive visualizations. Programming is not required since Tableau offers drag-and-drop functionalities to build charts and dashboards. However, users can still use Python and R to enhance visualizations and build models.
It's mainly used for businesses, but there are free versions to experiment with. Depending on which tools used, Tableau offers different tutorials and resources.