# How to Test for Normality in R

It is an assumption of many statistical tests that our data be normally distributed.  There are two broad approaches to test normality.  The first is to assess it visually, by reviewing a graph of our data.  The second is to assess it numerically, by conducting a normality test.

In this quick tutorial, we will show you how to test for normality using both graphs and normality tests in R. We will work with RStudio, a program that makes it easier to work with R.

## The Data

We start from the assumption that you have created or imported a data frame in R containing the variable that you want to test for normality.  Please see our tutorials on importing SPSS, Excel and CSV files into R, or our tutorial on manually entering data in R.

In this tutorial, we want to assess whether the variable (vector) polsci in the data frame sats_polsci is normally distributed.  We will do this using both visual and numerical methods.

## Visual Methods for Assessing Normality

One of the most common methods for assessing normality visually is the Q-Q plot, or Quantile-Quantile plot. If the data points on your Q-Q plot fall close to the straight diagonal line that runs from the bottom left to the top right corner of the plot, you can assume that your data is normally distributed.

Enter the following command in the RStudio console and then select enter on your keyboard to create a Q-Q plot in R:

qqnorm(dataframe\$variable, frame = FALSE)
qqline(dataframe\$variable)

Replace the highlighted text with the data that you want to use to create your own Q-Q plot:

• dataframe: the name of your data frame in RStudio.  The example that we are using in this tutorial is sats_polsci
• variable: the variable (vector) in the above data frame that you want to test for normality. In our example this is polsci

The command that we use to generate the Q-Q plot for our own example is:

qqnorm(sats_polsci\$polsci, frame = FALSE)
qqline(sats_polsci\$polsci)

Click the enter key on your keyboard.  You will see your Q-Q Plot in the Plots tab of the bottom right panel of RStudio.

We can see that the data points fall close to the diagonal line.  Therefore we can conclude that our data is normally distributed.

## Numerical Tests for Assessing Normality

It is a good idea to combine your Q-Q plot with a numerical test for normality such as the Shapiro-Wilk test or the Kolmogorov-Smirnov test.  The Shapiro-Wilk test is usually recommended for smaller sample sizes (< 50), like our variable.

To compute the Shapiro-Wilk test, enter the following command in the RStudio console and then select enter on your keyboard:

shapiro.test(dataframe\$variable)

Replace the highlighted text with the information about the variable that you want to test for normality as follows:

• dataframe: the name of your data frame in RStudio (sats_polsci for our example)
• variable: the variable (vector) in the above data frame that you want to test for normality (polsci for our example).

So, this is what we enter to compute the Shapiro-Wilk test for our variable:

shapiro.test(sats_polsci\$polsci)

The results of our Shapiro-Wilk test are as follows:

If the p value is greater than .05, then we can assume that the data is normally distributed.  The value for our test is 0.8292, so we assume that our variable is normally distributed.

If, however, the p value is less than or equal to .05, we assume that the data is not normally distributed.

***************

Thatâ€™s it for this tutorial. You should now be able to test data for normality in R using the Q-Q plot and the Shapiro-Wilk test.

***************