## What are Histogram & Density Plots? When to use them?

### What’s Density plot?

The kernel function is a mathematical function that determines how the density estimate is calculated at each point. The choice of kernel function can have a significant impact on the shape of the density plot, as different kernel functions will give different weights to the observations in the data.

The bandwidth, as previously mentioned, controls the amount of smoothing applied to the density plot. A larger bandwidth will lead to a smoother density plot, while a smaller bandwidth will lead to a more visually busy density plot. The choice of bandwidth is important, as it can have a significant impact on the shape and accuracy of the density plot.

Together, the choice of kernel function and bandwidth determine the smoothness and shape of the density plot. The optimal choice of kernel and bandwidth depends on the data being analyzed and the goals of the analysis. Generally, a Gaussian kernel is a good choice for most datasets, as it is smooth and has a continuous derivative, making it easier to differentiate and integrate.

Here is a sample density plot with Gaussian Kernel and different bandwidth (bandwidth=0.5 and bandwidth=2):

## Code Sample – Draw Histogram and Density Plot

Histrogram and density plot are very useful for examining the spread of a data variable. Following R commands with ggplot package helps in drawing histogram and density plots. As I am explaining with ggplot package, I am using diamonds data which comes with ggplot package. Pay attention to some of the following:

### Histogram Plots using Python

Check out my blog on Histogram plots using Matplotlib & Pandas: Python.

### Histogram Plot using R

In this code, the following is done:

• Firstly, the ggplot2 library was loaded. Then, a generate a set of random data using the rnorm() function was created.
• Next, we created a data frame called df that contains the random data in a single column called “Data”.
• We then use the ggplot() function to create the plot. The data parameter is used to specify the data frame to be used, and the aes() function is used to specify the variable to be plotted on the x-axis.
• We use the geom_histogram() function to create the histogram plot, and we set the binwidth parameter to 0.25 to specify the width of the bins. We also set the color parameter to “black” to outline the bars and the fill parameter to “blue” to fill the bars with blue color.
• A title to the plot using the ggtitle() function, a label for the x-axis using the xlab() function, and a label for the y-axis using the ylab() function was added to the plot.

The code below creates a plot with a histogram of the random data, with the bars colored blue and outlined in black. The ggplot2 package provides a lot of flexibility for customizing the appearance of the plot, such as adjusting the bin width, changing the bar color and style, and adding additional layers to the plot.

# Load ggplot2 library library(ggplot2) # Generate sample data set.seed(123) data <- rnorm(100) # Create data frame df <- data.frame(Data = data) # Create histogram plot ggplot(data = df, aes(x = Data)) + geom_histogram(binwidth = 0.5, color = “black”, fill = “blue”) + ggtitle(“Histogram Plot”) + xlab(“Data”) + ylab(“Frequency”)

### Density plot using Python

The following code creates a density plot using Python. The code represent the density plot representing the marks scored by students in a school. Note some of the following:

• A sample data set of 100 records using the np.random.randint() function from NumPy is used to generates random marks between 35 and 100.
• A data frame called df is created to contains the random marks data in a single column called “Marks”.
• The sns.kdeplot() function from Seaborn is used to create the density plot. The df[‘Marks’] parameter is used to specify the variable to be plotted. The shade parameter is set to True to fill the area under the curve with color. The color parameter is set to ‘blue’ to set the color of the plot.
• The Seaborn library is used to customize the plot style. The plot style is set to ‘darkgrid’ using sns.set_style(). The color palette is set to ‘pastel’ using sns.set_palette(). The font size of the axis labels is set using sns.set().

# Import necessary libraries import pandas as pd import seaborn as sns import numpy as np import matplotlib.pyplot as plt # Generate sample data np.random.seed(123) data = np.random.randint(35, 101, 100) # Create data frame df = pd.DataFrame({‘Marks’: data}) # Create density plot sns.kdeplot(df[‘Marks’], shade=True, color=’blue’) # Add axis labels and plot title sns.set_style(“darkgrid”) sns.set_palette(“pastel”) sns.set(font_scale=1.2) plt.xlabel(“Marks Scored”) plt.ylabel(“Density”) plt.title(“Density Plot of Student Marks”) plt.show()

## Conclusion

Histograms and Density Plots are essential visualization tools used to explore and understand the distribution of data. A histogram displays the distribution of data by dividing it into equal intervals and representing the frequency of data points in each interval using bars. On the other hand, a Density Plot displays the distribution of data by estimating the probability density function of the data and plotting it as a curve. In Python, we can use the matplotlib library to create histograms and the seaborn library to create density plots. In R, we can use the ggplot2 library to create both histograms and density plots. In summary, histograms and density plots are powerful tools that can help you to gain insights into your data. By using the code examples provided in this blog, you can start creating your own histograms and density plots in Python and R. So, go ahead and explore your data with these powerful visualization tools!

You are watching: Histogram and Density Plots in Python & R. Info created by GBee English Center selection and synthesis along with other related topics.