The post Repeated-Measures ANOVA in SPSS, Including Interpretation appeared first on EZ SPSS Tutorials.

]]>A repeated-measures ANOVA design is sometimes used to analyze data from a longitudinal study, where the requirement is to assess the effect of the passage of time on a particular variable. For this tutorial, we’re going to use data from a hypothetical study that looks at whether fear of spiders among arachnophobes increases over time if the disorder goes untreated.

- Click Analyze -> General Linear Model -> Repeated Measures
- Name your Within-Subject factor, specify the number of levels, then click Add
- Hit Define, and then drag and drop (left to right) a variable for each of the levels you specified (taking care to preserve their correct order)
- Click Options, and tick the Descriptive statistics and Estimate of effect size boxes, and then click Continue
- You’re now ready to run the test. Press the OK button, and your result will pop up in the Output Viewer

This is the data from our “study” as it appears in the SPSS Data View.

The variable we’re interested in here is SPQ which is a measure of the fear of spiders that runs from 0 to 31. The average score for a person with a spider phobia is 23, which compares to a score of slightly under 3 for a non-phobic.

SPQ is the dependent variable. The independent variable – or, to adopt the terminology of ANOVA, the within-subjects factor – is time, and it has three levels: SPQ_Time1 is the time of the first SPQ assessment; SPQ_Time2 is one year later; and SPQ_Time3 two years later.

The null hypothesis is that the mean SPQ score is the same for all levels of the within-subjects factor. This is what we’ll test with a one-way repeated-measures ANOVA.

To start, click Analyze -> General Linear Model -> Repeated Measures. This will bring up the Repeated Measures Define Factor(s) dialog box.

As we noted above, our within-subjects factor is time, so type “time” in the Within-Subject Factor Name box. And we have 3 levels, so input 3 into Number of Levels. Then click Add.

The dialog box should now look like this.

Okay, it’s now time to set up the within-subjects variables (at the moment SPSS knows that our within-subjects factor has three levels, but it doesn’t know which of our variables corresponds to each level). Click on the Define button, which will bring up the Repeated Measures dialog blox.

You’ve got to shift your within-subjects variables over to the Within-Subjects Variables box ensuring you maintain the correct order. You can drag and drop, or use the arrow button in the middle of the box. In our case, it just means moving SPQ_Time1, SPQ_Time2 & SPQ_Time3 into the three slots on the right.

The dialog box should look something like this once you’ve completed this stage.

We’re now ready to set up some of the options for the repeated-measures ANOVA. Click on the Options button.

What you see here depends on the version of SPSS you’re using. The most recent version of SPSS (26) has an options dialog box that looks like this.

Previous versions include an option for specifying estimated marginal means. It looks like this.

We’re going to assume that you’re using a previous version of SPSS, and you’re seeing the estimated marginal means option. If you’re not, then you need to click on the EM Means button (in the Repeated Measures dialog box) after you’ve finished with the Options dialog box, and set up the estimated marginal means there.

It’s not too difficult to get the options sorted out. You want to display descriptive statistics and estimates of effect size, so tick these options in the Display section (as above). And then in the Estimated Marginal Means section (or dialog box if you’re using the current version of SPSS), move “time” over to the Display Means for box, and then tick Compare main effects, and choose Bonferroni as the Confidence interval adjustment option.

Hit the Continue button(s) once you’ve got this set up.

That’s it, you’re ready to run the test. You should be looking at the original Repeated Measures dialog box. All you’ve got to do is hit OK, and you’ll see the result pop up in the Output Viewer.

SPSS produces a lot of output for the one-way repeated-measures ANOVA test. For the purposes of this tutorial, we’re going to concentrate on a fairly simple interpretation of all this output. (In future tutorials, we’ll look at some of the more complex options available to you, including multivariate tests and polynomial contrasts).

The descriptive statistics that SPSS outputs are easy enough to understand. The comparison between means (see above) gives us an idea of the direction of any possible effect. In our example, it seems as if fear of spiders increases over time, with the greatest increase (20.90 to 22.26 on the SPQ scale) occurring between year 1 (SPQ_Time2) and year 2 (SPQ_Time3). Of course, we won’t know whether these differences in the means reach significance until we look at the result of the ANOVA test.

A requirement that must be met before you can trust the *p*-value generated by the standard repeated-measures ANOVA is the homogeneity-of-variance-of-differences (or sphericity) assumption. For our purposes, it doesn’t matter too much what this means, we just need to know how to figure out whether the requirement has been satisfied.

SPSS tests this assumption by running Mauchly’s test of sphericity.

What we’re looking for here is a *p*-value that’s *greater* than .05. Our *p*-value is .494, which means we meet the assumption of sphericity.

You’ve got to be careful here. This assumption is frequently violated. If it is, in order to calculate a reliable value for *p, *you’ll need to adjust the degrees of freedom of *F* in line with the extent to which the assumption is violated. Happily SPSS does this work for you. All you’ve got to do is choose an alternative univariate test. Let’s look at this now.

This is where we read off the result of the repeated-measures ANOVA test.

As we have just discussed, our data meets the assumption of sphericity, which means we can read our result straight from the top row (Sphericity Assumed). The value of *F* is 5.699, which reaches significance with a *p-*value of .006 (which is less than the .05 alpha level). This means there is a statistically significant difference between the means of the different levels of the within-subjects variable (time).

If our data had not met the assumption of sphericity, we would need to use one of the alternative univariate tests. You’ll notice that these produce the same value for *F*, but that there is some variation in the reported degrees of freedom. In our case, there is not enough difference to alter the *p*-value – Greenhouse-Geisser and Huynh-Feldt, both produce significant results (*p* = .006).

Although we know that the differences between the means of our three within-subjects levels are large enough to reach significance, we don’t yet know between which of the various pairs of means the difference is significant. This is where pairwise comparisons come into play.

This table features three *unique* comparisons between the means for SPQ_Time1, SPQ_Time2 and SPQ_Time3. Only one of the differences reaches significance, and that’s the difference between the means for SPQ_Time1 and SPQ_Time 3 (see above). It is worth noting that SPSS is using an adjusted *p*-value here in order to control for multiple comparisons, and that the program lets you know if a mean difference has reached significance by attaching an asterisk to the value in column 3.

When reporting the result it’s normal to reference both the ANOVA test and any post hoc analysis that has been done.

Thus, given our example, you could write something like:

A repeated-measures ANOVA determined that mean SPQ scores differed significantly across three time points (

F(2, 58) = 5.699,p= .006). A post hoc pairwise comparison using the Bonferroni correction showed an increased SPQ score between the initial assessment and follow-up assessment one year later (20.1 vs 20.9, respectively), but this was not statistically significant (p= .743). However, the increase in SPQ score did reach significance when comparing the initial assessment to a second follow-up assessment taken two years after the original assessment (20.1 vs 22.26, p = .010). Therefore, we can conclude that the results for the ANOVA indicate a significant time effect for untreated fear of spiders as measured on the SPQ scale.

***************

Okay, that’s all for this tutorial. You should now be able to run a repeated-measures ANOVA, test the assumption of sphericity, make use of a pairwise comparison, and report the result. In future tutorials, we’ll look at some of the more sophisticated options available for this test. But this tutorial should provide enough information for you to run a basic repeated-measures ANOVA test.

The post Repeated-Measures ANOVA in SPSS, Including Interpretation appeared first on EZ SPSS Tutorials.

]]>The post Rules for Naming Variables in SPSS appeared first on EZ SPSS Tutorials.

]]>SPSS allows you to rename variables either via its Variable View or by using syntax. There are a number of rules governing the naming of variables.

- Names can be safely up to 32 characters long. Names may include alphanumeric characters, non-punctuation characters, and a period (.).
- You can’t have a space in a variable name.
- Don’t end a variable name with a period.
- Don’t end a variable name with an underscore.
- You can use periods and underscores
*within*a variable name. - You can use upper and lower case, and a mixture thereof, within a variable name.
- You can’t use SPSS reserved keywords as a variable name (i.e., you can’t use ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO or WITH).
- Each variable must be unique.

***************

That’s it, short and sweet. If you follow those rules when naming variables, you’re not going to go wrong.

The post Rules for Naming Variables in SPSS appeared first on EZ SPSS Tutorials.

]]>The post How to Rename a Dataset in SPSS appeared first on EZ SPSS Tutorials.

]]>- In the SPSS Data View, click File, then Rename Dataset…
- Type the new dataset name into the dialog box, following SPSS’s naming conventions
- Click OK

The normal scenario for renaming a dataset is where you’re working with two sets of data, each of which is open in its own Data View, and you want to give each dataset a memorable name to make them readily distinguishable.

SPSS uses a default convention to name datasets automatically. This is of the form, *DataSetn*, where *n* is an incremental integer value (i.e., 1, 2, 3, etc). You can see this with our example dataset above. (The exception to this is if you open a dataset using SPSS’s syntax language, in which case no name is given unless it is specified).

It’s easy to rename a dataset, though it’s not wildly obvious how it’s done.

Click on File, and then select the Rename Dataset option. This will bring up the Rename Dataset dialog box.

All you’ve got to do now is to type in the new name of the dataset. You need to follow SPSS’s naming rules, and you should try to make your name meaningful.

Once you’re done, just press the OK button.

As you can see, the dataset has been renamed, FirstTimeTask, as specified in the dialog box.

***************

That’s it for this quick tutorial. You should now have all the information you need to rename a dataset in the SPSS statistics program.

The post How to Rename a Dataset in SPSS appeared first on EZ SPSS Tutorials.

]]>The post How to Calculate the Median in SPSS appeared first on EZ SPSS Tutorials.

]]>- Click Analyze -> Descriptive Statistics -> Frequencies
- Move the variable for which you wish to calculate the median into the right-hand column
- Click the Statistics button, select Median under Central Tendency, and then press Continue
- Click OK to perform the calculation

This is the data set with which we’re going to be working.

So we’ve got three variables here: (a) duration – which is the duration in seconds it takes to complete a certain task; (b) sex – male or female; and (c) height – in inches.

You want to find out the median of the *duration* variable. In other words, you want to know the duration in seconds that lies exactly at the midpoint of the distribution of all durations.

There are a number of different ways of calculating the median in SPSS. This is probably the easiest.

Click Analyze -> Descriptive Statistics -> Frequencies.

This will bring up the Frequencies dialog box.

You need to get the variable for which you wish to calculate the median into the Variable(s) box on the right. You can do this by dragging and dropping, or by selecting the variable on the left, and then clicking the arrow in the middle.

Once you’ve set this up, hit the Statistics button to bring up the Statistics dialog box.

Here you just want to tick the Median option under Central Tendency on the right.

We’ve also selected Mean and Standard Deviation, just because these are standard measures of central tendency and dispersion (respectively).

When you’re done, click Continue. You should now be looking at something like this.

It’s probably worth noting that we’ve also selected Display frequency tables at the bottom on the left. This isn’t necessary, but the option will provide useful additional information.

You’re now set up to calculate the median.

Just hit the OK button.

The result appears in SPSS’s output viewer.

As you can see, this is very easy to interpret.

For our example, the median value is 7.02. (The mean is 7.3541, and the standard deviation is 2.33632).

***************

That’s all for this quick tutorial. You should now know how to calculate the median in SPSS.

The post How to Calculate the Median in SPSS appeared first on EZ SPSS Tutorials.

]]>The post Frequency Distribution in SPSS appeared first on EZ SPSS Tutorials.

]]>- Click on Analyze -> Descriptive Statistics -> Frequencies
- Move the variable of interest into the right-hand column
- Click on the Chart button, select Histograms, and the press the Continue button
- Click OK to generate a frequency distribution table

This is the data set we’ll be using.

It comes from a logic test featured on the Philosophy Experiments website that requires people to identify whether arguments are valid or invalid.

We’re interested in the Score variable, which is the number of questions people get right out of 15.

A frequency distribution table provides a snapshot view of the characteristics of a data set. It allows you to see how scores are distributed across the whole set of scores – whether, for example, they are spread evenly or skew towards a particular end of the distribution.

To make a frequency distribution table, click on Analyze -> Descriptive Statistics -> Frequencies.

This will bring up the Frequencies dialog box.

You need to get the variable for which you wish to generate the frequencies into the Variable(s) box on the right. You can do this by dragging and dropping, or by selecting the variable on the left, and then clicking the arrow in the middle.

Once you’ve set this up, hit the Charts button to bring up the Charts dialog box.

Now select Histograms as the chart type (and additionally it’s a good idea to tick the show normal curve option).

Click Continue when you’re done, which will bring you back to the Frequencies dialog box. This should look something like this.

Now you’re ready to generate the frequency distribution table and histogram. Just hit the OK button.

The output produced by SPSS is fairly easy to understand.

First we have the frequency distribution table:

The scores (in our case, the number of correct answers) are in the left column. The number of occurrences of a given score is specified in the Frequency column.

You’ve also got columns specifying percent and cumulative percent, where percent is the number of occurrences of a given score divided by the total number of scores multiplied by 100, and cumulative percent is the total you get when you add the percent values to each other as you descend down the rows.

The size of the sample is effectively the total number of valid scores, which you can see at the top of the table and at the bottom of the Frequency column.

A histogram provides a graphical representation of a frequency distribution.

Here’s ours.

The y-axis (on the left) represents a frequency count, and the x-axis (across the bottom), the value of the variable (in this case the number of correct answers). You’ll notice that SPSS also provides values for mean (9.7) and standard deviation (2.654). It appears that our distribution is somewhat skewed to the left.

If you want to save your histogram, you can right-click on it within the output viewer, and choose to copy it to an image file (which you can then use within other programs).

***************

We hope you have found this quick tutorial useful. You should now be able to generate a frequency distribution table in SPSS and also select the histogram option.

The post Frequency Distribution in SPSS appeared first on EZ SPSS Tutorials.

]]>The post One Way ANOVA in SPSS Including Interpretation appeared first on EZ SPSS Tutorials.

]]>- Click on Analyze -> Compare Means -> One-Way ANOVA
- Drag and drop your independent variable into the Factor box and dependent variable into the Dependent List box
- Click on Post Hoc, select Tukey, and press Continue
- Click on Options, select Homogeneity of variance test, and press Continue
- Press the OK button, and your result will pop up in the Output viewer

We’re starting from the assumption that you’ve already got your data into SPSS, and you’re looking at a Data View screen that looks a bit like this.

Our fictitious dataset contains a number of different variables. For the purposes of this tutorial, we’re interested in whether level of education has an effect on the ability of a person to throw a frisbee. Our independent variable, therefore, is Education, which has three levels – High School, Graduate and PostGrad – and our dependent variable is Frisbee Throwing Distance (i.e., the distance a subject throws a frisbee).

The one-way ANOVA test allows us to determine whether there is a significant difference in the mean distances thrown by each of the groups.

To start, click on Analyze -> Compare Means -> One-Way ANOVA.

This will bring up the One-Way ANOVA dialog box.

To set up the test, you’ve got to get your independent variable into the Factor box (Education in this case, see above) and dependent variable into the Dependent List box. You can do this by dragging and dropping, or by highlighting a variable, and then clicking on the appropriate arrow in the middle of the dialog.

After you’ve moved the variables over, you should click the Post Hoc button, which will allow you to specify the post hoc test(s) you wish to run.

The ANOVA test will tell you whether there is a significant difference between the means of two or more levels of a variable. However, if you’ve got more than two levels it’s not going to tell you between *which* of the various pairs of means the difference is significant. You need to do a post hoc test to find this out.

The Post Hoc dialog box looks like this.

You should select Tukey, as shown above, and ensure that your significance level is set to 0.05 (or whatever alpha level is right for your study).

Now press Continue to return to the previous dialog box.

You should be looking at this dialog box again.

Click Options to bring up the Options dialog box.

At the very least, you should select the Homogeneity of variance test option (since homogeneity of variance is required for the ANOVA test). Descriptive statistics and a Means plot are also useful.

Once you’ve made your selections, click Continue.

At this point, you’re ready to run the test.

Review your options, and click the OK button. You’ll see the result pop up in the Output Viewer.

SPSS produces a lot of data for the one-way ANOVA test. Let’s deal with the important bits in turn.

It’s worth having a quick glance at the descriptive statistics generated by SPSS.

If you look above, you’ll see that our sample data produces a difference in the mean scores of the three levels of our education variable. In particular, the data analysis shows that the subjects in the PostGrad group throw the frisbee quite a bit further than subjects in the other two groups. The key question, of course, is whether the difference in mean scores reaches significance.

A requirement for the ANOVA test is that the variances of each comparison group are equal. We have tested this using the Levene statistic. What you’re looking for here is a significance value that is greater than .05. You *don’t* want a significant result, since a significant result would suggest a real difference between variances.

In our example, as you can see above, the significance value of the Levene statistic based on a comparison of medians is .155. This is *not* a significant result, which means the requirement of homogeneity of variance has been met, and the ANOVA test can be considered to be robust.

Now that we know we have equal variances, we can look at the result of the ANOVA test.

The ANOVA result is easy to read. You’re looking for the value of F that appears in the Between Groups row (see above) and whether this reaches significance (next column along).

In our example, we have a significant result. The value of F is 3.5, which reaches significance with a *p-*value of .038 (which is less than the .05 alpha level). This means there is a statistically significant difference between the means of the different levels of the education variable.

However, as yet we don’t know between *which* of the various pairs of means the difference is significant. For this we need to look at the result of the post hoc Tukey HSD test.

If you take a look at the Multiple Comparisons table above you’ll see that significance values have been generated for the mean differences between pairs of the various levels of the education variable (Graduate – High School; Graduate – PostGrad; and High School – PostGrad).

In our example, the Tukey HSD (Honest Significant Difference) shows that it is only the mean difference between the High School and PostGrad groups that reaches significance (see the Sig. column, above). The *p*-value is .034, which is less than the standard .05 alpha level.

When reporting the result it’s normal to reference both the ANOVA test and the post hoc Tukey HSD test.

Thus, given our example here, you could write something like:

There was a statistically significant difference between groups as demonstrated by one-way ANOVA (

F(2,47) = 3.5,p= .038). A Tukey post hoc test showed that the PostGrad group was able to throw the frisbee statistically significantly further than the High School group (p= .034). There was no statistically significant difference between the Graduate and High School groups (p= . 691) or between the Graduate and PostGrad groups (p = .099).

***************

Right, that’s it for this tutorial. You should now be able to perform a one-way ANOVA test in SPSS, check the homogeneity of variance assumption has been met, run a post hoc test, and interpret and report your result.

The post One Way ANOVA in SPSS Including Interpretation appeared first on EZ SPSS Tutorials.

]]>The post Interpreting Chi Square Results in SPSS appeared first on EZ SPSS Tutorials.

]]>The tutorial starts from the assumption that you have already calculated the chi square statistic for your data set, and you want to know how to interpret the result that SPSS has generated. (We have a different tutorial explaining how to do a chi square test in SPSS).

You should be looking at a result that looks something like this in the SPSS output viewer.

The crosstabs analysis above is for two categorical variables, Religion and Eating. Each variable has two possible values: No Religion and Christian for the Religion variable; Meat Eater and Vegetarian for the Eating variable.

The null hypothesis of our hypothetical study is that these variables are not associated with each other – they are independent variables. The chi square test allows us to test this hypothesis.

The output of a crosstabs analysis contains a number of elements. Let’s look at each in turn.

As its name suggests, the Case Processing Summary is just a summary of the cases that were processed when the crosstabs analysis ran.

In our example, as you can see above, we had 30 valid cases, and no missing cases.

This is the crosstabs table, and it provides a lot of information that is useful for interpreting a chi square test result.

Our crosstabs table includes information about observed counts (what SPSS calls “Count”) and expected counts.

The observed count is the observed frequency in a particular cell of the crosstabs table. For example, our table shows that 5 meat eaters (out of a total of 16) have no religion and 3 Christians (out of a total of 14) are vegetarian.

The expected count is the predicted frequency for a cell under the assumption that the null hypothesis is true. In our case, the null hypothesis is that there is no association between the Eating variable and the Religion variable, which means the expected count is the predicted frequency for a cell on the assumption that eating and religion are not dependent on each other.

If you want to understand the result of a chi square test, you’ve got to pay close attention to the observed and expected counts. Put simply, the more these values diverge from each other, the higher the chi square score, the more likely it is to be significant, and the more likely it is we’ll reject the null hypothesis and conclude the variables are associated with each other.

If you look at the crosstabs table above, you’ll see that there are more Christian meat eaters than would be expected were the null hypothesis (that the variables are independent) true; and fewer Christian vegetarians. And similarly, there are more atheist vegetarians than would be expected, and fewer atheist meat eaters.

The question is whether these differences are big enough to allow us to conclude that the Eating variable and Religion variable are associated with each other. This is where the chi square statistic comes into play.

As you can see below, SPSS calculates a number of different measures of association.

We’re interested in the Pearson Chi-Square measure.

The chi square statistic appears in the Value column immediately to the right of “Pearson Chi-Square”. In this example, the value of the chi square statistic is 6.718.

The *p*-value (.010) appears in the same row in the “Asymptotic Significance (2-sided)” column. The result is significant if this value is equal to or less than the designated alpha level (normally .05). In this case, the *p*-value is smaller than the standard alpha value, so we’d reject the null hypothesis that asserts the two variables are independent of each other. To put it simply, the result is *significant* – the data suggests that the variables Religion and Eating are associated with each other.

The chi square statistic only tells you whether variables are associated. If you want to find out how they are associated then you need to return to the crosstabs table. In our example, the crosstabs table tells us that atheism is disproportionately associated with vegetarianism and meat eating is disproportionately associated with Christianity.

***************

That’s all for this tutorial. You should now have a good idea of how to interpret chi square results in SPSS.

***************

The second half of our SPSS chi square video includes a discussion of how to interpret chi square results in SPSS.

The post Interpreting Chi Square Results in SPSS appeared first on EZ SPSS Tutorials.

]]>The post Pearson Correlation Coefficient and Interpretation in SPSS appeared first on EZ SPSS Tutorials.

]]>- Click on Analyze -> Correlate -> Bivariate
- Move the two variables you want to test over to the Variables box on the right
- Make sure Pearson is checked under Correlation Coefficients
- Press OK
- The result will appear in the SPSS output viewer

For the purposes of this tutorial, we’re using a data set that comes from the Philosophy Experiments website.

The Valid or Invalid? exercise is a logic test that requires people to determine whether deductive arguments are valid or invalid. This is the complete data set.

We’re interested in two variables, Score and Time.

Score is the number of questions that people get right. Time is the amount of time in seconds it takes them to complete the test. We want to find out if these two things are correlated. Put simply, do people get more questions right if they take longer answering each question?

Pearson’s correlation coefficient will help us to answer this question.

To start, click on Analyze -> Correlate -> Bivariate.

This will bring up the Bivariate Correlations dialog box.

There are two things you’ve got to get done here. The first is to move the two variables of interest (i.e., the two variables you want to see whether they are correlated) into the Variables box on the right. You can do this by dragging and dropping (or using the arrow button in the middle).

The other thing is to ensure that “Pearson” is selected under Correlation Coefficients.

You can also select “Flag significant correlations”, though this is just optional.

That’s it. You’re set. Now just click OK.

The first thing you might notice about the result is that it is a 2×2 matrix. This means, in effect, you get two results for the price of one, because you get the correlation coefficient of Score and Time Elapsed, and the correlation coefficient of Time Elapsed and Score (which is the same result, obviously).

We’re interested in two parts of the result.

The first is the value of Pearson’ *r* – i.e., the correlation coefficient. That’s the Pearson Correlation figure (inside the square red box, above), which in this case is .094.

Pearson’s *r* varies between +1 and -1, where +1 is a perfect positive correlation, and -1 is a perfect negative correlation. 0 means there is no linear correlation at all.

Our figure of .094 indicates a very weak positive correlation. The more time that people spend doing the test, the better they’re likely to do, but the effect is very small.

We’re also interested in the 2-tailed significance value – which in this case is < .000 (inside the red oval, above). The standard alpha value is .05, which means that our correlation is highly significant, not just a function of random sampling error, etc.

This seems counterintuitive. How can a very weak correlation be highly significant? How is it possible to be so confident that such a weak correlation is real?

The answer has to do with our sample size (see the figure for N, above). We have 16033 cases in our data set. This means that our study has enough statistical power to identify even very weak effects.

***************

Right, we’ve come to the end of this tutorial. You should now be able to calculate Pearson’s correlation coefficient within SPSS, and to interpret the result that you get.

The post Pearson Correlation Coefficient and Interpretation in SPSS appeared first on EZ SPSS Tutorials.

]]>The post How to Recode String Variables in SPSS appeared first on EZ SPSS Tutorials.

]]>This is often done using the automatic recode functionality of SPSS, but in this case we’re going to do it manually because of the extra control we get.

- Click on Transform -> Recode into Different Variables
- Drag and drop the variable you wish to recode over into the Input Variable -> Output Variable box
- Create a new name for your output variable in the Output Variable (Name) text box, and click the Change button
- Click the Old and New Values… button
- Type the first value of your input variable into the Old Value (Value) text box, and the value you want to replace it with into the New Value (Value) text box. Then click Add to confirm the recoding
- Repeat this process for all the existing values of your input variable
- Press Continue, and then OK to do the recoding
- The new recoded output variable will appear in the Data View

We’re assuming that you’ve fired up SPSS, opened a data file, or entered new data, and you’re looking at the Data View window.

The issue we have with our data is that the Education variable has been coded as a string whereas it should be numeric. SPSS provides a number of options to help us to recode the variable. We’re going to look at the Recode into Different Variables method.

As its name suggests, if you choose this option, SPSS will use an input variable to create a new recoded variable.

To begin this process, click on Transform -> Recode into Different Variables, which will bring up its associated dialog box.

You need to drag and drop the variable you want to recode over into the Input Variable -> Output Variable box (it reads String Variable -> Output Variable, above, because SPSS has identified the Education variable as a string).

The next step is to give your new recoded output variable a name, and then to hit the Change button. As you can see, we’ve called our new recoded variable EdNumeric.

Once you’ve got this set up, click the Old and New Values… button so you can specify how you want to recode the variable.

The Old and New Values dialog box allows you to specify new values for your existing input variable.

This is easy to accomplish. The old value goes into the Old Value (Value) text box on the left, and the new value you want to replace it with into the New Value (Value) text box on the right. Then click Add to confirm the recoding.

Repeat for all the existing values of your input variable.

As you can see above, we’ve got “School” recoded as 1, and we’re about to add “Graduate” recoded as 2.

Once you’ve got the recoding set up, press Continue.

That’s it, really. Press OK to recode your variable.

If you take a look at the Data View, you’ll see you’ve got a new variable which contains the recoded values.

Our new EdNumeric variable is a numeric, nominal variable, where 1 = School, 2 = Graduate and 3 = Postgrad.

You could just leave it at that, but probably you’d want to set up Value Labels. This is the topic of a separate tutorial, so we won’t explain how to do that here, but the advantage of doing so is that you’ll end up with meaningful labels in your output, and you don’t have to remember how the coding works.

Once you’ve set up Value Labels for the new EdNumeric variable, there will be no difference between its appearance and that of the old Education variable. The only thing that has changed is the underlying coding, which is now numeric.

***************

That’s it for this quick tutorial. You should now be able to recode string values into a different variable in SPSS. In future tutorials, we’ll look at some of the other options for recoding values in SPSS.

The post How to Recode String Variables in SPSS appeared first on EZ SPSS Tutorials.

]]>The post How to Select Cases in SPSS appeared first on EZ SPSS Tutorials.

]]>The data we’re using for this tutorial comes from a hypothetical study that examines how long it takes people to fall asleep during a statistics lesson.

The two variables we’re interested in here are Sex, either male or female, and Duration, which is the number of minutes that elapses from the start of a statistics lesson before a subject falls asleep.

Imagine we already know that in the population as a whole the average amount of time it takes for a *woman* to fall asleep is 8.15 minutes. We want to compare this to the average time for women in our sample. But the trouble is our sample contains data for both males and females, and any tests we run will be on that basis. The question is how do we select only female cases, thereby excluding males from any tests that we run?

This is where the select cases functionality comes in useful.

To begin, click Data -> Select Cases.

This will bring up the the Select Cases dialog box. This provides a number of different options for selecting cases. We’re going to focus on the “If condition is satisfied” option, which you should select.

Once you’ve selected it, you need to click on the If… button (as above).

The Select Cases: If dialog box will appear. This is where you do the work of selecting female only cases.

The idea here is to construct an expression in the text box at the top that functions to select cases. You can see here we’ve got “Sex = 0”, which tells SPSS that it should only select cases where the value of the variable Sex is 0 (Female = 0, Male = 1).

Obviously, it is possible to build much more complex expressions than this simple test of equivalence. For example, you could tell SPSS to select cases where Sex is Female and Height is greater than 68 inches (“Sex = 0 & Height > 68”), or where Duration is greater than 8 minutes or Height is less than 60 inches (“Duration > 8 | Height < 60”).

Once you’ve set up the expression, as above, hit the Continue button, and then click OK in the Select Cases dialog box. SPSS will now select cases as per your instruction(s).

If you take a look at the Data View, you’ll see that things have changed to indicate that SPSS is now operating with a subset of the original data set.

As you can see, SPSS has struck out cases on the left that are not selected. It has also introduced a new filter variable that specifies whether a case has been selected or not. Finally, bottom right, it says Filter On, which tells you that any tests or analyses you run will be on a subset of the data – that is, on only the selected cases.

Let’s check this out by running a one sample t test to compare the average amount of time it takes for women in the general population to fall asleep in a statistics lesson with the average for the women in our sample.

Click on Analyze -> Compare Means -> One-Sample T Test, and then set up the test like this.

You can see we’ve got Duration as our test variable, and we’re comparing it against a population mean of 8.15 minutes (the average amount of time it takes women in the general population to fall asleep in a statistics lesson).

Hit OK to run the test.

This is the result.

The value for N here is 50, which tells you immediately that select cases has worked. Our dataset has 100 cases within it, of which 50 are women.

In terms of the result, we can see that the women in our sample fall to sleep on average 1 minute faster than women in the general population. This is a significant difference, with a t value of -3.1 and a *p*-value of .003.

There are a couple of things to note before we finish.

The first is that you can return a data set to its non-filtered state by returning to the Select Cases dialog box (Data -> Select Cases), and choosing All cases (the first option available). This won’t delete the new filter variable, but it will render it inactive. You’ll also notice that “Filter On” will no longer show at the bottom right of the Data View.

The other thing to note is that SPSS offers an alternative to Select Cases that works better in many situations. This is Split File, and it will be the topic of a future tutorial.

***************

That’s all for this tutorial. You should now be able to select cases in SPSS, and to work with the resultant filtered data.

The post How to Select Cases in SPSS appeared first on EZ SPSS Tutorials.

]]>