Published on

A simple way to data visualization and correlation with seaborn.heatmap Python function.

Authors

Introduction

Visualizing data is an art in which people are either talented or not. The good news for you is that Python has a library called Seaborn, which provides high-level tools such as heatmaps to visualize your data and make correlations with it more leisurely. This blog post will show how to use seaborn.heatmap function to do just that!

Also, check the post's footer for an easy way to run your Jupyter Notebook in the Google Colaboratory. "Google Colab" is available for free to anyone with a Google account.

Getting started

The first step is to read the data set. To do this, we'll use the Pandas library.

# Importing libraries
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Additional libraries that needed to loading a .csv file from GitHub in Python
import requests
import io

Loading .csv file in Python from GitHub

The second step is downloading data from .csv file hosted in the public repository on GitHub.

# Downloading the csv file from GitHub (make sure the url is the raw version of the file on GitHub)**
url = "https://raw.githubusercontent.com/rsipakov/PythonProjectsShared/master/seaborn/SO2_HCHO_correlation/Dataset_SO2_HCHO_post_20_Kyiv_2017.csv"
download = requests.get(url).content

# Reading the downloaded content and turning it into a pandas dataframe
SO2_df = pd.read_csv(io.StringIO(download.decode('utf-8')))

If you need to download data from a private repository, you need to use a personal access token.

# Username of your GitHub account
github_username = 'YOUR GITHUB USERNAME'

# Personal Access Token (PAO) of your GitHub account
personal_token = 'YOUR GITHUB PAO'

# Creates a reusable session object that includes your GitHub credentials.
github_session = requests.Session()
github_session.auth = (github_username, personal_token)

# Downloading the csv file from your private repository on GitHub (make sure the url is the raw version of the file on GitHub)**
url = "https://raw.githubusercontent.com/rsipakov/PythonProjectsShared/master/seaborn/SO2_HCHO_correlation/Dataset_SO2_HCHO_post_20_Kyiv_2017.csv"
download = requests.get(url).content

# Reading the downloaded content and turning it into a pandas dataframe
SO2_df = pd.read_csv(io.StringIO(download.decode('utf-8')))

This is how the DataFrame looks

# View the first five rows of the data
SO2_df.head()
#SO2NO2NOHCHO
00.01840.03680.02580.0018
10.02140.07350.04140.0039
20.04630.15300.08980.0062
30.05170.16290.09470.0026
40.01380.04940.03800.0015
# Generate the correlation matrix
SO2_df.corr()
#SO2NO2NOHCHO
SO21.0000000.5168450.6573160.365689
NO20.5168451.0000000.8164630.686926
NO0.6573160.8164631.0000000.626236
HCHO0.3656890.6869260.6262361.000000
# Output data correlation into .xlsx file
cr1 = SO2_df.corr()
cr1.to_excel("output_SO2_HCHO.xlsx")

Basic seaborn.heatmap()

# Generate a heatmap using .corr() function
sns.heatmap(SO2_df.corr())
ocean
# Save heatmap in the .png format
sns.heatmap(SO2_df.corr())
plt.savefig('heatmap_SO2_HCHO.png', transparent=True)

One more, basic seaborn.scatterplot()

# Generate a scatterplot
sns.scatterplot(x='SO2', y='HCHO', data=SO2_df)
ocean
# Save scatterplot in the .pdf format
sns.scatterplot(x='SO2', y='HCHO', data=SO2_df)
plt.savefig('scatterplot_SO2_HCHO.pdf')

NOTE

  • If you would like to download data set from a local file (for example, .xls), use the following:
SO2_df = pd.read_excel('/PATH TO/Dataset_SO2_HCHO_post_20_Kyiv_2017.xls', engine='xlrd')`

Conclusion

In this blog post, I'm gone over the basics of how to create and use heatmaps in Python. Now, you get quickly started with your Jupyter Notebook project right here in Google Colaboratory.

You may get started immediately by importing a Jupyter Notebook for this tutorial from my public GitHub repository.

I hope you found this blog post helpful. If so, please share with your friends! Thank you for reading.

Feel free to comment on Twitter what you thought of it. Something broken? File a bug