Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (2024)

And a guide to using the Omnibus K-squared and Jarque–Bera normality tests

We’ll cover the following four topics in this section:

What is normality and why should you care if the residual errors from your trained regression model are normally distributed?
What are Skewness and Kurtosis measures and how to use them for testing for normality of residual errors?
How to use two very commonly used tests of normality, namely the Omnibus K-squared and Jarque–Bera tests that are based on Skewness and Kurtosis.
How to apply these tests to a real-world data set to decide if Ordinary Least Squares regression is the appropriate model for this data set.

What is normality?

Normality means that your data follows the normal distribution. Specifically, each value y_i in Y is a ‘realization’ of some normally distributed random variable N(µ_i, σ_i) as follows:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (1)

Normality in the context of linear regression

While building a linear regression model, one assumes that Y depends on a matrix of regression variables X. This makes Y conditionally normal on X. If X =[x_1, x_2,…, x_n] are jointly normal, then µ = f(X) is a normally distributed vector, and so is Y, as follows:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (2)

Why test for normality?

Several statistical techniques and models assume that the underlying data is normally distributed.

I’ll give below three such situations where normality rears its head:

As seen above, in Ordinary Least Squares (OLS) regression, Y is conditionally normal on the regression variables X in the following manner: Y is normal, if X =[x_1, x_2,…, x_n] are jointly normal. But nothing bad happens to your OLS model, even if Y isn’t normally distributed.
A non-strict requirement of classical linear regression models is that the residual errors of regression ‘ϵ=(y_obs-y_predicted)’ should be normally distributed with an expected value of zero i.e. E(ϵ) = 0. If the residual errors ϵ are not normally distributed, one cannot reliably calculate confidence intervals for the model’s forecasts using the t-distribution, especially for small sample sizes (n ≤ 20).

Bear in mind that even if the errors are not normally distributed, the OLS estimator is still the BLUE i.e. Best Linear Unbiased Estimator for the model as long as E(ϵ)=0, and all other requirements of OLSR are satisfied.

Finally, certain goodness-of-fit techniques such as the F-test for regression analysis assume that the residual errors of the competing regression models are all normally distributed. If the residual errors are not normally distributed, the F-test cannot be reliably used to compare the goodness-of-fit of two competing regression models.

How can I tell if my data is (not) normally distributed?

Several statistical tests are available to test the degree to which your data deviates from normality, and if the deviation is statistically significant.

We’ll look at moment based measures, namely Skewness and Kurtosis, and the statistical tests of significance, namely Omnibus K² and Jarque — Bera, that are based on these measures.

What is ‘Skewness’ and how to useit?

Skewness lets you test by how much the overall shape of a distribution deviates from the shape of the normal distribution.

The following figures illustrate skewed distributions.

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (3)

The moment based definition of Skewness is as follows:

Skewness is defined as the third standardized central moment, of the random variable of the probability distribution.

The formula for skewness of the population is show below:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (4)

Skewness has the following properties:

Skewness is a moment based measure (specifically, it’s the third moment), since it uses the expected value of the third power of a random variable.
Skewness is a central moment, because the random variable’s value is centralized by subtracting it from the mean.
Skewness is a standardized moment, as its value is standardized by dividing it by (a power of) the standard deviation.
Because it is the third moment, a probability distribution that is perfectly symmetric around the mean will have zero skewness. This is because, for each y_i that is greater than the mean µ, there will be a corresponding y_i smaller than mean µ by the same amount. Since the distribution is symmetric around the mean, both y_i values will have the same probability. So pairs of (y_i- µ) will cancel out, yielding a total skewness of zero.
Skewness of the normal distribution is zero.
While a symmetric distribution will have a zero skewness, a distribution having zero skewness is not necessarily symmetric.
Certain ratio based distributions — most famously the Cauchy distribution — have an undefined skewness as they have an undefined mean µ.

In practice, we can estimate the skewness in the population by calculating skewness for a sample. For the sample, we cheat a little by assuming that the random variable is uniformly distributed, so the probability of each y_i in the sample is 1/n and the third, central, sample moment becomes 1/n times a simple summation over all (y_i —y_bar)³.

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (5)

Skewness is very sensitive to the parameters of the probability distribution.

What is ‘Kurtosis’ and how to useit?

Kurtosis is a measure of how differently shaped are the tails of a distribution as compared to the tails of the normal distribution. While skewness focuses on the overall shape, Kurtosis focuses on the tail shape.

Kurtosis is defined as follows:

Kurtosis is the fourth standardized central moment, of the random variable of the probability distribution.

The formula for Kurtosis is as follows:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (7)

Kurtosis has the following properties:

Just like Skewness, Kurtosis is a moment based measure and, it is a central, standardized moment.
Because it is the fourth moment, Kurtosis is always positive.
Kurtosis is sensitive to departures from normality on the tails. Because of the 4th power, smaller values of centralized values (y_i-µ) in the above equation are greatly de-emphasized. In other words, values in Y that lie near the center of the distribution are de-emphasized. Conversely, larger values of (y_i-µ), i.e. the ones lying on the two tails of the distribution are greatly emphasized by the 4th power. This property makes Kurtosis largely ignorant about the values lying toward the center of the distribution, and it makes Kurtosis sensitive toward values lying on the distribution’s tails.
Kurtosis of the normal distribution is 3.0. While measuring the departure from normality, Kurtosis is sometimes expressed as excess Kurtosis which is the balance amount of Kurtosis after subtracting 3.0.

For a sample, excess Kurtosis is estimated by dividing the fourth central sample moment by the fourth power of the sample standard deviation, and subtracting 3.0, as follows:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (8)

Here is an excellent image from Wikipedia Commons that shows the Excess Kurtosis of various distributions. I have super-imposed a magnified version of the tails in the top left side of the image:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (9)

Normality tests based on Skewness andKurtosis

While Skewness and Kurtosis quantify the amount of departure from normality, one would want to know if the departure is statistically significant. The following two tests let us do just that:

The Omnibus K-squared test
The Jarque–Bera test

In both tests, we start with the following hypotheses:

Null hypothesis (H_0): The data is normally distributed.
Alternate hypothesis (H_1): The data is not normally distributed, in other words, the departure from normality, as measured by the test statistic, is statistically significant.

Omnibus K-squared normality test

The Omnibus test combines the random variables for Skewness and Kurtosis into a single test statistic as follows:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (10)

Probability distribution of the test statistic:
In the above formula, the functions Z1() and Z2() are meant to make the random variables g1 and g2 approximately normally distributed. Which in turn makes their sum of squares approximately Chi-squared(2) distributed, thereby making the statistic of the Omnibus K-squared approximately Chi-squared(2) distributed under the assumption that null hypothesis is true, i.e. the data is normally distributed.

Jarque–Bera normality test

The test statistic for this test is as follows:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (11)

Probability distribution of the test statistic:
The test statistic is the scaled sum of squares of random variables g1 and g2 that are each approximately normally distributed, thereby making the JB test statistic approximately Chi-squared(2) distributed, under the assumption that the null hypothesis is true.

Example

We’ll use the following data set from the U.S. Bureau of Labor Statistics, to illustrate the application of normality tests:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (12)

Here are the first few rows of the data set:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (13)

You can download the data from this link.

Let’s fit the following OLS regression model to this data set:

Wages = β_0 + β_1*Year+ ϵ

Where:

Wages is the response a.k.a. dependent variable,
Year is the regression a.k.a. explanatory variable,
β_0 is the intercept of regression,
β_1 is the coefficient of regression, and
ϵ is the unexplained regression error

We’ll use Python libraries pandas and statsmodels to read the data, and to build and train our OLS model for this data.

Let’s start with importing the required packages:

import pandas as pdimport statsmodels.formula.api as smfimport statsmodels.stats.api as smsfrom statsmodels.compat import lzipimport matplotlib.pyplot as pltfrom statsmodels.graphics.tsaplots import plot_acf

Read the data into the pandas data frame:

df = pd.read_csv('wages_and_salaries_1984_2019_us_bls_CXU900000LB1203M.csv', header=0)

Plot Wages against Year:

fig = plt.figure()plt.xlabel('Year')plt.ylabel('Wages and Salaries (USD)')fig.suptitle('Wages and Salaries before taxes. All US workers')wages, = plt.plot(df['Year'], df['Wages'], 'go-', label='Wages and Salaries')plt.legend(handles=[wages])plt.show()

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (14)

Create the regression expression in Patsy syntax. In the following expression, we are telling statsmodels that Wages is the response variable and Year is the regression variable. statsmodels will automatically add an intercept to the regression equation.

expr = 'Wages ~ Year'

Configure the OLS regression model by passing the model expression, and train the model on the data set, all in one step:

olsr_results = smf.ols(expr, df).fit()

Print the model summary:

print(olsr_results.summary())

In the following output, I have called out the areas that bode well and bode badly for our OLS model’s suitability for the data:

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (15)

Interpreting theresults

Following are a few things to note from the results:

The residual errors are positively skewed with a skewness of 0.268 and they also have an excess positive Kurtosis of 2.312 i.e. thicker tails.
The Omnibus test and the JB test have both produced test-statistics (1.219 and 1.109 respectively), which lie within the H_0 acceptance zone of the Chi-squared(2) PDF (see figure below). Thus we will accept the hypothesis H_0, i.e. the residuals are normally distributed.

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (16)

You can also get the values of Skewness, excess Kurtosis, and the test statistics for Omnibus and JB tests as follows:

name = ['Omnibus K-squared test', 'Chi-squared(2) p-value']#Pass the residual errors of the regression into the testtest = sms.omni_normtest(olsr_results.resid)lzip(name, test)

This prints out the following:

> [('Omnibus K-squared test', 1.2194658631806088), ('Chi-squared(2) p-value', 0.5434960003061313)]

name = ['Jarque-Bera test', 'Chi-squared(2) p-value', 'Skewness', 'Kurtosis']test = sms.jarque_bera(olsr_results.resid)lzip(name, test)

This prints out the following:

[('Jarque-Bera test', 1.109353094606092), ('Chi-squared(2) p-value', 0.5742579764509973), ('Skewness', 0.26780140709870015), ('Kurtosis', 2.3116476989966737)]

Since the residuals seem to be normally distributed, we can also trust the 95% confidence levels reported by the model for the two model params.
We can also trust the p-value of the F-test. It’s exceedingly tiny, indicating that the both model params are also jointly significant.
Finally, the R-squared reported by the model is quite high indicating that the model has fitted the data well.

Now for the bad part: Both the Durbin-Watson test and the Condition number of the residuals indicates auto-correlation in the residuals, particularly at lag 1.

We can easily confirm this via the ACF plot of the residuals:

plot_acf(olsr_results.resid, title='ACF of residual errors')plt.show()

Testing For Normality of Residual Errors Using Skewness And Kurtosis Measures - Statistical Modeling and Forecasting (17)

This presents a problem for us: One of the fundamental requirements of Classical Linear Regression Models is that the residual errors should not be auto-correlated. In this case they most certainly are so. Which means that the OLS estimator may have under-estimated the variance in the training data, which in turn means that it’s predictions will be off by a large amount.

Simply put, the OLS estimator is no longer BLUE (Best Linear Unbaised Estimator) for the model. Bummer!

The auto-correlation of residual errors points to a possibility that our model was incorrectly chosen, or incorrectly configured. Particularly,

We may have left out some key explanatory variables which is causing some signal to leak into the residuals in the form of auto-correlations, OR,
The choice of the OLS model itself may be entirely wrong for this data set. We may need to look at alternate models such as the Regression with ARIMA Errors model which we had covered in an earlier section.

In such cases, your choice is between accepting the sub optimal-ness of the chosen model, and addressing the above two reasons for sub optimality.

Summary

Several statistical procedures assume that the underlying data follows the normal distribution.
Skewness and Kurtosis are two moment based measures that will help you to quickly calculate the degree of departure from normality.
In addition to using Skewness and Kurtosis, you should use the Omnibus K-squared and Jarque-Bera tests to determine whether the amount of departure from normality is statistically significant.
In some cases, if the data (or the residuals) are not normally distributed, your model will be sub-optimal.

References, Citations and Copyrights

Data links

Wages and salaries by Occupation: Total wage and salary earners (series id: CXU900000LB1203M). U.S. Bureau of Labor Statistics under US BLS Copyright Terms. Curated data set link for download

Images

All images are copyright Sachin Date under CC-BY-NC-SA, unless a different source and copyright are mentioned underneath the image.

UP: Table of Contents