A histogram is a bar chart that represents the frequency distribution of data. The height of each bar corresponds to the number of items in each class or cell. The width of each bar represents a measurement interval. The histogram shows basic information such as the central location, shape, and spread of examined data.
A histogram helps you determine if your data follows a normal distribution, which is bell-shaped and symmetrical and is the basis for capability analysis. It allows you to do the following:
- Communicate the distribution of numerical data quickly to others
- Analyze if a process meets customer requirements
- Determine whether the yields of two or more measures are different
- See if a process change occurred from one time period to another
Interpret your histograms in three steps: studying the shape, calculating the descriptive statistics, and comparing it to a normal distribution.
Study and Describe the Shape of Your Histogram
As described above, a normal distribution looks bell-shaped. It is important to study and describe the shape of a histogram to justify preventative actions that align with your quality needs. If you notice the data is not bell-shaped, here is what that could mean.
- Bimodal: A bimodal shape means there are two peaks in the data, meaning they may have come from two different systems.
- Skewed Right: If the data skews to the right, it has a positive skew, meaning the data has an extensive number of occurrences on the left and fewer occurrences as you move to the right. All of the data in a positively skewed histogram have a value higher than zero.
- Skewed Left: Data skewed to the left means it has a negative skew. On this histogram, there are more occurrences on the right side of the chart and fewer occurrences on the left side. Data systems with a boundary, such as 100, have values less than 100.
- Uniform: When the data is uniform across the board, it does not provide enough information about the system. If there are no variations in the histogram, check to see if there are several combined sources of variation.
- Random: Random distribution occurs when you do not see patterns in the data, and it may have several modes (peaks). Since it acts similarly to a uniform distribution, study the data for combined sources of variation.
Usually, data will not be bell-shaped if they come from different systems. One solution is to separate the data with bimodal, uniform, and random distribution and analyze them individually. For uniform and random distribution, you may need to try different groupings to see if there are more relevant pattern results.
View the article to see the shapes your data could take.
Calculate the Descriptive Statistics
Several statistics provide practical information when describing and analyzing a histogram and the area beneath the curve. Here are the key definitions and formulas to use when describing and analyzing a histogram.
- Central Location: The central location describes the middle of the data set, which includes mean, median, and mode.
- Spread: Range and the standard deviation illustrate data spread. Range refers to the difference between the lowest and highest values. Standard deviation measures how different values are from each other and the mean. View both formulas.
- Skewness: The measure of histogram asymmetry, also called frequency distribution. A histogram with normal distribution is symmetrical, meaning there is the same amount of data on both sides of the mean and has a skewness of 0.
- Kurtosis: A measure of the combined weight of tails to the rest of the distribution. The kurtosis value increases as the tail of the distribution becomes heavier and decreases as they become lighter. A histogram with normal distribution has a kurtosis value of 0. Some statistical text will not subtract 3, and in this case normal distribution would have a kurtosis value = 3. View the formula.
- Coefficient of Variance: A measure of how much variation exists or the significance of the sigma to the mean. If there is a larger coefficient of variance, the sigma is more significant. View the formula.
- Chi-Square: This statistic determines how well the actual distribution fits the expected distribution and compares the number of observations found in each cell in a histogram to ones found in the expected distribution. View the formula for chi-square, along with the degrees of freedom table.
View more in-depth information about calculating descriptive statistics.
Compare Your Histogram to a Normal Distribution
How can you tell if the shape of your histogram is normal? Compare it to normal distribution. The following characteristics of normal distributions will help in studying your histogram.
- The mean (average), median, and mode of the histogram are equal.
- The histogram is symmetrical. If the histogram gets cut in half, each side mirrors the other. It must also form a bell-shaped curve.
- The total area under the histogram curve is equal to one and does not get shown because the tails extend to infinity. Standard practice is to show plus or minus three standard deviations from the average.
- If the spread of data (described by standard deviation) is known, it is possible to determine the percentage of the data beneath the curve. One times the standard deviation to the right and left of the mean (the center of the curve) captures 68.28%, two times the standard deviation captures 95.44%, and three times the standard deviation captures 99.73% of the area under the curve. This percentage is accurate for all data within a normal distribution. View these percentages in the standard normal distribution table.
- It is possible to describe the area under the curve after finding the mean and the standard deviation.
This article provides visuals to help you understand this concept better.