Paired Samples T-Test in R

In this tutorial, we will show you how to conduct and interpret a paired samples t-test in R, and how to report the results.

The paired samples t-test, also known as the dependent samples t-test, compares the mean difference between two measurements of the same continuous variable.  It is sometimes used in the context of a pre-test, post-test research design.  For example, we could record the body weight of dieters before and after they completed a  healthy food preparation program.

As with all of our R tutorials, we recommend working with RStudio, a program that makes it easier to work with R.

The Data

First of all, we need to import or create a data frame in R.  Please see our tutorials on importing your ExcelCSV and SPSS files into R, and our tutorial on manually entering data in R.

The data frame for a paired samples t-test should include two columns representing your paired continuous variables.  Each row should represent one participant. 

Our example data frame contains the body weight measurements (in lbs) of 20 participants both before they started a healthy food preparation program, and also after they completed this program.  We want to know whether there is a difference between these two body weight measurements.

Data frame for paired samples t-test in RStudio

Paired Samples t-Test Assumptions

The assumptions of the paired samples t-test are as a follows:

  • The observations within each group should be independent.
  • There should be no significant outliers in the differences between the paired variables.  We will show you how to test this assumption with a boxplot.
  • The differences between the paired variables should be approximately normally distributed.  We will show you how to test this assumption with a Shapiro-Wilks test.  It is worth noting, however, that the paired samples t-test is generally robust to minor violations of normality, especially with larger sample sizes (e.g., > 30).  

Testing Paired Samples t-Test Assumptions in R

It is important to note that the outliers and normality assumptions refer to the differences between the paired variables, rather than to the variables themselves.  That’s why we need to create a variable that represents these differences:

differences <- dataframe$xdataframe$y

Replace the highlighted text in this command as follows:

  • dataframe: the name of the data frame that contains your paired variables (dieters in our example)
  • x: the name of the variable that represents the first set of measurements (before in our example)
  • y: the name of the variable that represents the second set of measurements (after in our example)

For our example, the command is as follows:

differences <- dieters$beforedieters$after

Once you select the enter key on your keyboard, you will see your this new differences variable in the Environment tab in the top right panel of RStudio:

Differences variable (vector) in Environment tab of RStudio

Testing the ‘No Significant Outliers’ Assumption

We can now test the outliers assumption by visualizing this differences variable with a boxplot. The easiest way to do this is as follows:

boxplot (differences)

Once we click the enter key on our keyboard, RStudio will display our boxplot in the Plots tab of the bottom right panel of RStudio.

Our example boxplot is displayed in the left column of the table below.  It does not have any outliers, so our data pass this assumption.  For comparison, the boxplot displayed in the right column has a noticeable outlier (we have drawn a red box around it).

Boxplot without outliersBoxplot of differences for a paired sample t-test in R Boxplot with outlierBoxplot with outlier in R

Testing the Normality Assumption

We can test the normality assumption using the Shapiro-Wilk test as follows:

shapiro.test (differences)

Here are the results of our Shapiro-Wilk test:

Results of Shapiro-Wilk test for differences variable in RStudio

We assume normal distribution if the p value is greater than .05.  Our value of 0.2542 is well above .05, so we assume that the differences between our paired variables are normally distributed.

When the p value is less than or equal to .05, however, we can not assume that the data is normally distributed.

The Paired Samples t-Test in R

Once you have ensured that your data meet the assumptions outlined above, you can use the following command to conduct the paired samples t-test itself:

t.test (dataframe$xdataframe$y, paired = TRUE)

Replace the highlighted text with the relevant information for your study as follows:

  • dataframe: the data frame that contains your paired variables (e.g., dieters).
  • x: the variable that contains the first set of measurements (e.g., before)
  • y: the variable that contains the second set of measurements (e.g., after)

So, for our example, we enter the following in the RStudio console:

t.test (dieters$beforedieters$after, paired = TRUE)

We also recommend computing the mean and the standard deviation of your paired variables since you would typically include this information when you report your results. 

You can compute the mean of a variable as follows:

mean (dataframe$variable)

You can compute the standard deviation of a variable as follows:

sd (dataframe$variable)

So, for our example, we type the following into the RStudio console:

mean (dieters$before)
mean (dieters$after)
sd (dieters$before)
sd (dieters$after)

The Results of the Paired Samples t-Test

The results of the paired samples t-test for our example are as follows:

Paired samples t-test results in R Studio

First, we note that there is a mean difference of 5.8 lbs between dieters’ body weight measurements before and after they complete the healthy food preparation program.  The fact that this is a positive number means that dieters’ mean body weight measurements were lower for the second measurement (after the program) than they were for the first measurement (before the program).  That is, they lost weight.  A negative mean difference would have meant that dieters’ second mean body weight measurements were higher than their first mean body weight measurements. 

The other important value from the paired samples t-test is the p value.   If the p value is less than or equal to the alpha level we have set for our test, then the difference between the paired measurements is significant.  Setting an alpha level of .05 or .01 is typical.  Our value of 1.303e-08 converts to the real number 0.00000001303.  Since this is much less than our selected alpha level of .05, we conclude that the difference between dieters’ body weight measurements before and after the healthy food preparation program is significant.

On the other hand, if the p value is greater than our selected alpha level, we can not conclude that there is a significant difference between our paired measurements (dieters’ body weight in our example).

We also asked R to compute the means and standard deviations of our two variables:

Means and standard deviations of variables for paired samples t-test in R

Reporting a Paired Samples t-Test

If we wanted to report the results of our paired samples t-test in APA Style, we could do so as follows:

 The results of a paired samples t-test indicated that dieters’ body weight measurements (in lbs) were significantly lower after completing a healthy food preparation program (M = 190.45, SD = 10.69) than they were before the program (M = 196.25, SD = 10.98), t(19) = 9.45, p = < .001.

***************

That’s it for this tutorial.  You should now be able to conduct and interpret a paired samples t-test in R, and write up the results of your test. 

***************