Sunday, October 2, 2022
HomeMachinelearningThe histogram in Python

The histogram in Python

What is a distribution plot in python?

It is another technique to display the plot on canvas, which will display the histogram to us. So, moving further regarding the distribution plot first of all let’s understand what is histogram.

In the previous blog post, we have learned how to use a dataset using seaborn, the same dataset we are using here. Seaborn library in python.

Histograms represent or plot the data using the bin.

Check out the image shown down below, the axis representing the data variable is divided into a set of discrete bins and the count of observations falling within each bin is shown using the height of the corresponding bar:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
dfHis = sns.load_dataset('tips')
sns.distplot(dfHis['total_bill'])

distplot() is used to visualize the parametric distribution of a dataset.

Let’s understand how the plot works, the y axis has a count, and the x-axis has bars as a bin. Most of our bills in between 10 to 20 dollars.

By default, in the above image, you can see the line which is termed as “KDE (Kernel Density Estimation)” and histogram in my canvas.

To get rid of the line you need to pass one parameter, and in your canvas/plot/graph and the only histogram will be displayed.

sns.distplot(dfHis['total_bill'],kde = False) 

You can also decide the number of bins bypassing the 3 arguments in your function.

sns.distplot(dfHis['total_bill'],kde = False,bins=50) 

When the number of bins is higher than the plot/graph have every instance of our total bill.

Using histogram visualization is quite difficult, because it may be possible that many of our records or value are too close to each other and it is a little bit difficult to plot on top of each other and separate them.

What are joint plots seaborn and why is it used??

The word itself has a definition; it will help us to joint or compare two distribution plots and by very it is just two variables. The parameter allows us to choose how actually we want to compare these distributions.

For x and y, we pass the string that is our column name. The two things you want to compare each other.

Here we have two distribution plots y-axis has tips and the total bill along the x-axis, in between we have a scattered plot.

By default, it displays the scattered plot but bypassing other parameters you can view the other form also. Check the code as shown below.

What is a scatter plot seaborn and what is it used for?

A Scatter plot is the most perfect way to visualize the distribution where each observation is represented in a two-dimensional plot via the x and y-axis.

Here above image 3, shows that when the total bill amount is higher, the amount of the tips also gets increased.

Let’s pass the KIND argument is above the line of code and try to learn what is going on inside this plot.

sns.jointplot(x='total_bill',y='tip',data=dataframe,kind='hex') 

What is Hexbin plot in seaborn?

Imagine that you have a lot of data points, it may be possible to of overlapping, also difficult to analyze through scatterplots. To overcome this, pass the kind argument with value hex to display the Hexbin plot.

Hex: – hexagonal distribution representation. The darker the plot represents we have a certain number of points in it.

sns.jointplot(x='total_bill',y='tip',data=dataframe,kind='kde') 

KDE will allow us two-dimensional KDE. The higher density the points matchup most.

Let’s perform another operation with our data frame, to do a heatmap of the correlation between each of the columns.

Here you see the diagonal of full correlation which makes sense because each column should be perfectly correlative itself.

Try to pass another comment as shown in the code and try to find out what you get in your canvas.

sns.heatmap(dataframe.corr(),annot=True)

Now in the real world when we work with machine learning and dataset, the study contains many variables.

It is very important to analyze each and every variable and plotting becomes very complex and time-consuming, by using a pair plots, we can plot pairwise relationships across an entire data frame.

Parameter: –

seaborn.pairplot(data,....)

Data: – Dataframe

Hue: – Pass the column name of categorical column. (ex, hue=’XYZ’)

Palette: – A set of colors.

Kind: – we have already learned in the above example, to display “scatter” or etc.

diag_kind: – A plot for diagonal subplots. {‘hist’, ‘kde’}

Pairplot(), will joint plot for every single possible combination of the numerical column in the data frame.

The row name represents the x-axis and the column name represents the y axis.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -

Most Popular

Recent Comments