The post How to Calculate the Median in SPSS appeared first on EZ SPSS Tutorials.

]]>- Click Analyze -> Descriptive Statistics -> Frequencies
- Move the variable for which you wish to calculate the median into the right-hand column
- Click the Statistics button, select Median under Central Tendency, and then press Continue
- Click OK to perform the calculation

This is the data set with which we’re going to be working.

So we’ve got three variables here: (a) duration – which is the duration in seconds it takes to complete a certain task; (b) sex – male or female; and (c) height – in inches.

You want to find out the median of the *duration* variable. In other words, you want to know the duration in seconds that lies exactly at the midpoint of the distribution of all durations.

There are a number of different ways of calculating the median in SPSS. This is probably the easiest.

Click Analyze -> Descriptive Statistics -> Frequencies.

This will bring up the Frequencies dialog box.

You need to get the variable for which you wish to calculate the median into the Variable(s) box on the right. You can do this by dragging and dropping, or by selecting the variable on the left, and then clicking the arrow in the middle.

Once you’ve set this up, hit the Statistics button to bring up the Statistics dialog box.

Here you just want to tick the Median option under Central Tendency on the right.

We’ve also selected Mean and Standard Deviation, just because these are standard measures of central tendency and dispersion (respectively).

When you’re done, click Continue. You should now be looking at something like this.

It’s probably worth noting that we’ve also selected Display frequency tables at the bottom on the left. This isn’t necessary, but the option will provide useful additional information.

You’re now set up to calculate the median.

Just hit the OK button.

The result appears in SPSS’s output viewer.

As you can see, this is very easy to interpret.

For our example, the median value is 7.02. (The mean is 7.3541, and the standard deviation is 2.33632).

***************

That’s all for this quick tutorial. You should now know how to calculate the median in SPSS.

The post How to Calculate the Median in SPSS appeared first on EZ SPSS Tutorials.

]]>The post Frequency Distribution in SPSS appeared first on EZ SPSS Tutorials.

]]>- Click on Analyze -> Descriptive Statistics -> Frequencies
- Move the variable of interest into the right-hand column
- Click on the Chart button, select Histograms, and the press the Continue button
- Click OK to generate a frequency distribution table

This is the data set we’ll be using.

It comes from a logic test featured on the Philosophy Experiments website that requires people to identify whether arguments are valid or invalid.

We’re interested in the Score variable, which is the number of questions people get right out of 15.

A frequency distribution table provides a snapshot view of the characteristics of a data set. It allows you to see how scores are distributed across the whole set of scores – whether, for example, they are spread evenly or skew towards a particular end of the distribution.

To make a frequency distribution table, click on Analyze -> Descriptive Statistics -> Frequencies.

This will bring up the Frequencies dialog box.

You need to get the variable for which you wish to generate the frequencies into the Variable(s) box on the right. You can do this by dragging and dropping, or by selecting the variable on the left, and then clicking the arrow in the middle.

Once you’ve set this up, hit the Charts button to bring up the Charts dialog box.

Now select Histograms as the chart type (and additionally it’s a good idea to tick the show normal curve option).

Click Continue when you’re done, which will bring you back to the Frequencies dialog box. This should look something like this.

Now you’re ready to generate the frequency distribution table and histogram. Just hit the OK button.

The output produced by SPSS is fairly easy to understand.

First we have the frequency distribution table:

The scores (in our case, the number of correct answers) are in the left column. The number of occurrences of a given score is specified in the Frequency column.

You’ve also got columns specifying percent and cumulative percent, where percent is the number of occurrences of a given score divided by the total number of scores multiplied by 100, and cumulative percent is the total you get when you add the percent values to each other as you descend down the rows.

The size of the sample is effectively the total number of valid scores, which you can see at the top of the table and at the bottom of the Frequency column.

A histogram provides a graphical representation of a frequency distribution.

Here’s ours.

The y-axis (on the left) represents a frequency count, and the x-axis (across the bottom), the value of the variable (in this case the number of correct answers). You’ll notice that SPSS also provides values for mean (9.7) and standard deviation (2.654). It appears that our distribution is somewhat skewed to the left.

If you want to save your histogram, you can right-click on it within the output viewer, and choose to copy it to an image file (which you can then use within other programs).

***************

We hope you have found this quick tutorial useful. You should now be able to generate a frequency distribution table in SPSS and also select the histogram option.

The post Frequency Distribution in SPSS appeared first on EZ SPSS Tutorials.

]]>The post One Way ANOVA in SPSS Including Interpretation appeared first on EZ SPSS Tutorials.

]]>- Click on Analyze -> Compare Means -> One-Way ANOVA
- Drag and drop your independent variable into the Factor box and dependent variable into the Dependent List box
- Click on Post Hoc, select Tukey, and press Continue
- Click on Options, select Homogeneity of variance test, and press Continue
- Press the OK button, and your result will pop up in the Output viewer

We’re starting from the assumption that you’ve already got your data into SPSS, and you’re looking at a Data View screen that looks a bit like this.

Our fictitious dataset contains a number of different variables. For the purposes of this tutorial, we’re interested in whether level of education has an effect on the ability of a person to throw a frisbee. Our independent variable, therefore, is Education, which has three levels – High School, Graduate and PostGrad – and our dependent variable is Frisbee Throwing Distance (i.e., the distance a subject throws a frisbee).

The one-way ANOVA test allows us to determine whether there is a significant difference in the mean distances thrown by each of the groups.

To start, click on Analyze -> Compare Means -> One-Way ANOVA.

This will bring up the One-Way ANOVA dialog box.

To set up the test, you’ve got to get your independent variable into the Factor box (Education in this case, see above) and dependent variable into the Dependent List box. You can do this by dragging and dropping, or by highlighting a variable, and then clicking on the appropriate arrow in the middle of the dialog.

After you’ve moved the variables over, you should click the Post Hoc button, which will allow you to specify the post hoc test(s) you wish to run.

The ANOVA test will tell you whether there is a significant difference between the means of two or more levels of a variable. However, if you’ve got more than two levels it’s not going to tell you between *which* of the various pairs of means the difference is significant. You need to do a post hoc test to find this out.

The Post Hoc dialog box looks like this.

You should select Tukey, as shown above, and ensure that your significance level is set to 0.05 (or whatever alpha level is right for your study).

Now press Continue to return to the previous dialog box.

You should be looking at this dialog box again.

Click Options to bring up the Options dialog box.

At the very least, you should select the Homogeneity of variance test option (since homogeneity of variance is required for the ANOVA test). Descriptive statistics and a Means plot are also useful.

Once you’ve made your selections, click Continue.

At this point, you’re ready to run the test.

Review your options, and click the OK button. You’ll see the result pop up in the Output Viewer.

SPSS produces a lot of data for the one-way ANOVA test. Let’s deal with the important bits in turn.

It’s worth having a quick glance at the descriptive statistics generated by SPSS.

If you look above, you’ll see that our sample data produces a difference in the mean scores of the three levels of our education variable. In particular, the data analysis shows that the subjects in the PostGrad group throw the frisbee quite a bit further than subjects in the other two groups. The key question, of course, is whether the difference in mean scores reaches significance.

A requirement for the ANOVA test is that the variances of each comparison group are equal. We have tested this using the Levene statistic. What you’re looking for here is a significance value that is greater than .05. You *don’t* want a significant result, since a significant result would suggest a real difference between variances.

In our example, as you can see above, the significance value of the Levene statistic based on a comparison of medians is .155. This is *not* a significant result, which means the requirement of homogeneity of variance has been met, and the ANOVA test can be considered to be robust.

Now that we know we have equal variances, we can look at the result of the ANOVA test.

The ANOVA result is easy to read. You’re looking for the value of F that appears in the Between Groups row (see above) and whether this reaches significance (next column along).

In our example, we have a significant result. The value of F is 3.5, which reaches significance with a *p-*value of .038 (which is less than the .05 alpha level). This means there is a statistically significant difference between the means of the different levels of the education variable.

However, as yet we don’t know between *which* of the various pairs of means the difference is significant. For this we need to look at the result of the post hoc Tukey HSD test.

If you take a look at the Multiple Comparisons table above you’ll see that significance values have been generated for the mean differences between pairs of the various levels of the education variable (Graduate – High School; Graduate – PostGrad; and High School – PostGrad).

In our example, the Tukey HSD (Honest Significant Difference) shows that it is only the mean difference between the High School and PostGrad groups that reaches significance (see the Sig. column, above). The *p*-value is .034, which is less than the standard .05 alpha level.

When reporting the result it’s normal to reference both the ANOVA test and the post hoc Tukey HSD test.

Thus, given our example here, you could write something like:

There was a statistically significant difference between groups as demonstrated by one-way ANOVA (

F(2,47) = 3.5,p= .038). A Tukey post hoc test showed that the PostGrad group was able to throw the frisbee statistically significantly further than the High School group (p= .034). There was no statistically significant difference between the Graduate and High School groups (p= . 691) or between the Graduate and PostGrad groups (p = .099).

***************

Right, that’s it for this tutorial. You should now be able to perform a one-way ANOVA test in SPSS, check the homogeneity of variance assumption has been met, run a post hoc test, and interpret and report your result.

The post One Way ANOVA in SPSS Including Interpretation appeared first on EZ SPSS Tutorials.

]]>The post Interpreting Chi Square Results in SPSS appeared first on EZ SPSS Tutorials.

]]>The tutorial starts from the assumption that you have already calculated the chi square statistic for your data set, and you want to know how to interpret the result that SPSS has generated. (We have a different tutorial explaining how to do a chi square test in SPSS).

You should be looking at a result that looks something like this in the SPSS output viewer.

The crosstabs analysis above is for two categorical variables, Religion and Eating. Each variable has two possible values: No Religion and Christian for the Religion variable; Meat Eater and Vegetarian for the Eating variable.

The null hypothesis of our hypothetical study is that these variables are not associated with each other – they are independent variables. The chi square test allows us to test this hypothesis.

The output of a crosstabs analysis contains a number of elements. Let’s look at each in turn.

As its name suggests, the Case Processing Summary is just a summary of the cases that were processed when the crosstabs analysis ran.

In our example, as you can see above, we had 30 valid cases, and no missing cases.

This is the crosstabs table, and it provides a lot of information that is useful for interpreting a chi square test result.

Our crosstabs table includes information about observed counts (what SPSS calls “Count”) and expected counts.

The observed count is the observed frequency in a particular cell of the crosstabs table. For example, our table shows that 5 meat eaters (out of a total of 16) have no religion and 3 Christians (out of a total of 14) are vegetarian.

The expected count is the predicted frequency for a cell under the assumption that the null hypothesis is true. In our case, the null hypothesis is that there is no association between the Eating variable and the Religion variable, which means the expected count is the predicted frequency for a cell on the assumption that eating and religion are not dependent on each other.

If you want to understand the result of a chi square test, you’ve got to pay close attention to the observed and expected counts. Put simply, the more these values diverge from each other, the higher the chi square score, the more likely it is to be significant, and the more likely it is we’ll reject the null hypothesis and conclude the variables are associated with each other.

If you look at the crosstabs table above, you’ll see that there are more Christian meat eaters than would be expected were the null hypothesis (that the variables are independent) true; and fewer Christian vegetarians. And similarly, there are more atheist vegetarians than would be expected, and fewer atheist meat eaters.

The question is whether these differences are big enough to allow us to conclude that the Eating variable and Religion variable are associated with each other. This is where the chi square statistic comes into play.

As you can see below, SPSS calculates a number of different measures of association.

We’re interested in the Pearson Chi-Square measure.

The chi square statistic appears in the Value column immediately to the right of “Pearson Chi-Square”. In this example, the value of the chi square statistic is 6.718.

The *p*-value (.010) appears in the same row in the “Asymptotic Significance (2-sided)” column. The result is significant if this value is equal to or less than the designated alpha level (normally .05). In this case, the *p*-value is smaller than the standard alpha value, so we’d reject the null hypothesis that asserts the two variables are independent of each other. To put it simply, the result is *significant* – the data suggests that the variables Religion and Eating are associated with each other.

The chi square statistic only tells you whether variables are associated. If you want to find out how they are associated then you need to return to the crosstabs table. In our example, the crosstabs table tells us that atheism is disproportionately associated with vegetarianism and meat eating is disproportionately associated with Christianity.

***************

That’s all for this tutorial. You should now have a good idea of how to interpret chi square results in SPSS.

***************

The second half of our SPSS chi square video includes a discussion of how to interpret chi square results in SPSS.

The post Interpreting Chi Square Results in SPSS appeared first on EZ SPSS Tutorials.

]]>The post Pearson Correlation Coefficient and Interpretation in SPSS appeared first on EZ SPSS Tutorials.

]]>- Click on Analyze -> Correlate -> Bivariate
- Move the two variables you want to test over to the Variables box on the right
- Make sure Pearson is checked under Correlation Coefficients
- Press OK
- The result will appear in the SPSS output viewer

For the purposes of this tutorial, we’re using a data set that comes from the Philosophy Experiments website.

The Valid or Invalid? exercise is a logic test that requires people to determine whether deductive arguments are valid or invalid. This is the complete data set.

We’re interested in two variables, Score and Time.

Score is the number of questions that people get right. Time is the amount of time in seconds it takes them to complete the test. We want to find out if these two things are correlated. Put simply, do people get more questions right if they take longer answering each question?

Pearson’s correlation coefficient will help us to answer this question.

To start, click on Analyze -> Correlate -> Bivariate.

This will bring up the Bivariate Correlations dialog box.

There are two things you’ve got to get done here. The first is to move the two variables of interest (i.e., the two variables you want to see whether they are correlated) into the Variables box on the right. You can do this by dragging and dropping (or using the arrow button in the middle).

The other thing is to ensure that “Pearson” is selected under Correlation Coefficients.

You can also select “Flag significant correlations”, though this is just optional.

That’s it. You’re set. Now just click OK.

The first thing you might notice about the result is that it is a 2×2 matrix. This means, in effect, you get two results for the price of one, because you get the correlation coefficient of Score and Time Elapsed, and the correlation coefficient of Time Elapsed and Score (which is the same result, obviously).

We’re interested in two parts of the result.

The first is the value of Pearson’ *r* – i.e., the correlation coefficient. That’s the Pearson Correlation figure (inside the square red box, above), which in this case is .094.

Pearson’s *r* varies between +1 and -1, where +1 is a perfect positive correlation, and -1 is a perfect negative correlation. 0 means there is no linear correlation at all.

Our figure of .094 indicates a very weak positive correlation. The more time that people spend doing the test, the better they’re likely to do, but the effect is very small.

We’re also interested in the 2-tailed significance value – which in this case is < .000 (inside the red oval, above). The standard alpha value is .05, which means that our correlation is highly significant, not just a function of random sampling error, etc.

This seems counterintuitive. How can a very weak correlation be highly significant? How is it possible to be so confident that such a weak correlation is real?

The answer has to do with our sample size (see the figure for N, above). We have 16033 cases in our data set. This means that our study has enough statistical power to identify even very weak effects.

***************

Right, we’ve come to the end of this tutorial. You should now be able to calculate Pearson’s correlation coefficient within SPSS, and to interpret the result that you get.

The post Pearson Correlation Coefficient and Interpretation in SPSS appeared first on EZ SPSS Tutorials.

]]>The post How to Recode String Variables in SPSS appeared first on EZ SPSS Tutorials.

]]>This is often done using the automatic recode functionality of SPSS, but in this case we’re going to do it manually because of the extra control we get.

- Click on Transform -> Recode into Different Variables
- Drag and drop the variable you wish to recode over into the Input Variable -> Output Variable box
- Create a new name for your output variable in the Output Variable (Name) text box, and click the Change button
- Click the Old and New Values… button
- Type the first value of your input variable into the Old Value (Value) text box, and the value you want to replace it with into the New Value (Value) text box. Then click Add to confirm the recoding
- Repeat this process for all the existing values of your input variable
- Press Continue, and then OK to do the recoding
- The new recoded output variable will appear in the Data View

We’re assuming that you’ve fired up SPSS, opened a data file, or entered new data, and you’re looking at the Data View window.

The issue we have with our data is that the Education variable has been coded as a string whereas it should be numeric. SPSS provides a number of options to help us to recode the variable. We’re going to look at the Recode into Different Variables method.

As its name suggests, if you choose this option, SPSS will use an input variable to create a new recoded variable.

To begin this process, click on Transform -> Recode into Different Variables, which will bring up its associated dialog box.

You need to drag and drop the variable you want to recode over into the Input Variable -> Output Variable box (it reads String Variable -> Output Variable, above, because SPSS has identified the Education variable as a string).

The next step is to give your new recoded output variable a name, and then to hit the Change button. As you can see, we’ve called our new recoded variable EdNumeric.

Once you’ve got this set up, click the Old and New Values… button so you can specify how you want to recode the variable.

The Old and New Values dialog box allows you to specify new values for your existing input variable.

This is easy to accomplish. The old value goes into the Old Value (Value) text box on the left, and the new value you want to replace it with into the New Value (Value) text box on the right. Then click Add to confirm the recoding.

Repeat for all the existing values of your input variable.

As you can see above, we’ve got “School” recoded as 1, and we’re about to add “Graduate” recoded as 2.

Once you’ve got the recoding set up, press Continue.

That’s it, really. Press OK to recode your variable.

If you take a look at the Data View, you’ll see you’ve got a new variable which contains the recoded values.

Our new EdNumeric variable is a numeric, nominal variable, where 1 = School, 2 = Graduate and 3 = Postgrad.

You could just leave it at that, but probably you’d want to set up Value Labels. This is the topic of a separate tutorial, so we won’t explain how to do that here, but the advantage of doing so is that you’ll end up with meaningful labels in your output, and you don’t have to remember how the coding works.

Once you’ve set up Value Labels for the new EdNumeric variable, there will be no difference between its appearance and that of the old Education variable. The only thing that has changed is the underlying coding, which is now numeric.

***************

That’s it for this quick tutorial. You should now be able to recode string values into a different variable in SPSS. In future tutorials, we’ll look at some of the other options for recoding values in SPSS.

The post How to Recode String Variables in SPSS appeared first on EZ SPSS Tutorials.

]]>The post How to Select Cases in SPSS appeared first on EZ SPSS Tutorials.

]]>The data we’re using for this tutorial comes from a hypothetical study that examines how long it takes people to fall asleep during a statistics lesson.

The two variables we’re interested in here are Sex, either male or female, and Duration, which is the number of minutes that elapses from the start of a statistics lesson before a subject falls asleep.

Imagine we already know that in the population as a whole the average amount of time it takes for a *woman* to fall asleep is 8.15 minutes. We want to compare this to the average time for women in our sample. But the trouble is our sample contains data for both males and females, and any tests we run will be on that basis. The question is how do we select only female cases, thereby excluding males from any tests that we run?

This is where the select cases functionality comes in useful.

To begin, click Data -> Select Cases.

This will bring up the the Select Cases dialog box. This provides a number of different options for selecting cases. We’re going to focus on the “If condition is satisfied” option, which you should select.

Once you’ve selected it, you need to click on the If… button (as above).

The Select Cases: If dialog box will appear. This is where you do the work of selecting female only cases.

The idea here is to construct an expression in the text box at the top that functions to select cases. You can see here we’ve got “Sex = 0”, which tells SPSS that it should only select cases where the value of the variable Sex is 0 (Female = 0, Male = 1).

Obviously, it is possible to build much more complex expressions than this simple test of equivalence. For example, you could tell SPSS to select cases where Sex is Female and Height is greater than 68 inches (“Sex = 0 & Height > 68”), or where Duration is greater than 8 minutes or Height is less than 60 inches (“Duration > 8 | Height < 60”).

Once you’ve set up the expression, as above, hit the Continue button, and then click OK in the Select Cases dialog box. SPSS will now select cases as per your instruction(s).

If you take a look at the Data View, you’ll see that things have changed to indicate that SPSS is now operating with a subset of the original data set.

As you can see, SPSS has struck out cases on the left that are not selected. It has also introduced a new filter variable that specifies whether a case has been selected or not. Finally, bottom right, it says Filter On, which tells you that any tests or analyses you run will be on a subset of the data – that is, on only the selected cases.

Let’s check this out by running a one sample t test to compare the average amount of time it takes for women in the general population to fall asleep in a statistics lesson with the average for the women in our sample.

Click on Analyze -> Compare Means -> One-Sample T Test, and then set up the test like this.

You can see we’ve got Duration as our test variable, and we’re comparing it against a population mean of 8.15 minutes (the average amount of time it takes women in the general population to fall asleep in a statistics lesson).

Hit OK to run the test.

This is the result.

The value for N here is 50, which tells you immediately that select cases has worked. Our dataset has 100 cases within it, of which 50 are women.

In terms of the result, we can see that the women in our sample fall to sleep on average 1 minute faster than women in the general population. This is a significant difference, with a t value of -3.1 and a *p*-value of .003.

There are a couple of things to note before we finish.

The first is that you can return a data set to its non-filtered state by returning to the Select Cases dialog box (Data -> Select Cases), and choosing All cases (the first option available). This won’t delete the new filter variable, but it will render it inactive. You’ll also notice that “Filter On” will no longer show at the bottom right of the Data View.

The other thing to note is that SPSS offers an alternative to Select Cases that works better in many situations. This is Split File, and it will be the topic of a future tutorial.

***************

That’s all for this tutorial. You should now be able to select cases in SPSS, and to work with the resultant filtered data.

The post How to Select Cases in SPSS appeared first on EZ SPSS Tutorials.

]]>The post How to Do a One Sample T Test and Interpret the Result in SPSS appeared first on EZ SPSS Tutorials.

]]>- Analyze -> Compare Means -> One-Sample T Test
- Drag and drop the variable you want to test against the population mean into the Test Variable(s) box
- Specify your population mean in the Test Value box
- Click OK
- Your result will appear in the SPSS output viewer

Our working assumption, as per usual, is that you’ve opened SPSS, and that you’re looking at the Data View within which you’ve got some data.

Our data is from a hypothetical study that examines how long it takes people to fall asleep during a statistics lesson.

For the purpose of this tutorial, we’re only interested in the Duration variable, which is the number of minutes that elapses from the start of the lesson before a subject falls asleep.

Imagine we already know that in the population as a whole the average amount of time it takes for somebody to fall asleep is 8.45 minutes. This compares to the average time in our sample of 7.35 minutes. The question is whether the difference between these two means is large enough for us to conclude there is a real difference between our sample group and the wider population in terms of the amount of time it takes to fall asleep.

If we knew the population standard deviation, we could do a z test to answer this question, but we don’t, which means a one sample t test is the appropriate test.

To begin the one sample t test, click on Analyze -> Compare Means -> One-Sample T Test. This will bring up the One-Sample T Test dialog box.

You’ve got to get the variable you want to test – in our case, the Duration variable – into the right hand Test Variable(s) box, and input the population mean into the Test Value box. For the variable, you can just drag and drop, or use the arrow in the middle of the dialog box.

Once it’s set up, it should look like this.

If you’ve got this far, you’re ready to run the test. Just hit the OK button.

The result of the one sample t test will appear in the SPSS output viewer. It will look like this.

This output is relatively easy to interpret.

The t value is -4.691 (see the One-Sample Test table, above), which gives us a *p*-value (or 2-tailed significance value) of .000. This is going to be a significant result for any realistic alpha level.

A standard alpha level is .05, and .000 is smaller than .05, so we’re going to reject the null hypothesis which asserts there is no difference between our sample mean and the population mean.

More technically, what the result shows is that on the assumption that the null hypothesis is true, a difference as big as we’ve got between our sample mean and the population mean is extremely unlikely to have arisen purely by chance.

This counts as evidence that the difference between our sample group and the population as a whole is real. Put simply, it seems that our subjects fall to sleep in statistics lessons more quickly than is true of the population as a whole.

***************

Okay, that’s it for this quick tutorial. You should now be able to run a one sample t test in SPSS, and to interpret the result that you get.

The post How to Do a One Sample T Test and Interpret the Result in SPSS appeared first on EZ SPSS Tutorials.

]]>The post Export Data from SPSS into a MySQL Database appeared first on EZ SPSS Tutorials.

]]>As you can see below, we have a simple data set with four variables. For the purposes of this tutorial, it doesn’t really matter what these variables represent, but for reasons that will become clear later it is worth taking note of the presence of the ID variable, which functions as a unique identifier for each case in the data set.

Our task is to get this data into a table in MySQL.

We’re working on the assumption that you have opened SPSS on a Windows operating system, and you’re looking at an empty Data View.

Click on File -> Export -> Database. The Export to Database Wizard will pop up.

If you haven’t previously set up an ODBC data source connection, you’re not going to see anything in the Data Sources box, and you’re going to need to set up the connection.

We’re not going to show you how to do this here, because it’s exactly the same procedure as described in our import into SPSS from MySQL tutorial. You should check that out, setup the ODBC data source connection as detailed there, and then return to this tutorial.

We’re assuming that you now have the ODBC data source connection set up, and that you’re looking at the Export to Database Wizard.

Highlight your data source connection (as above), and then click the Next button.

You’ll now be asked to choose what sort of export you want to set up.

The simplest option is to create a new table within a MySQL database. We’ve selected this option, and named the new table PEFExperiment (see above).

Clicking the Next button will bring up a dialog box asking you to select the variables you want to be stored in the new table.

The SPSS variables show up in the text box on the left. The idea is to move the variables you want to import into your database over to the right, where you can set a number of their attributes (e.g., type and width).

This is where the significance of the ID variable comes into play. It is normal for a database table to have a primary key, which functions as a unique identifier for each database entry. This is often implemented by means of a field that increments automatically each time an entry is added to a table. This functionality is not supported by the SPSS database wizard. Ideally, therefore, you should ensure that your SPSS data set includes a variable that functions as a unique identifier, and which you can import into the database table. The ID variable performs this role in our data set.

You can specify that a variable should function as a primary key when you select the variables to store in a new table. This is what we’ve done below.

As you can see, we’ve elected to include all our variables within the new database table. To move them over from the left, you just drag and drop. We’ve specified ID as the primary key by ticking the little key icon.

The other thing worth noting is that we’ve instructed SPSS to export the value labels (Male, Female) for the Sex variable rather than the data values (1, 0). This is just for illustrative purposes, and shouldn’t be taken as a recommendation.

If you set things up correctly, then you can just hit Finish at this point. If you want to check the options you’ve chosen, click Next, and review the summary dialog box that appears.

We’re going to hit Finish to do the export.

This is the output that SPSS generates for an export to a database.

As you’ll be able to see above, there are a couple of SQL statements (marked) that are responsible for creating the new table and generating the records. Let’s see if they’ve worked.

Here you can see the first 20 rows of the new table that SPSS has created within the MySQL database.

ID has been correctly instantiated as the primary key (though you can’t tell from this screenshot), and the sex variable has been populated with its value labels rather than a numerical data type (in MySQL it’s the varchar data type).

***************

That’s it, really. You should now have an idea of how to export data from SPSS to a MySQL database. In a later tutorial, we’ll look at some of the more sophisticated options on offer during this process.

The post Export Data from SPSS into a MySQL Database appeared first on EZ SPSS Tutorials.

]]>The post How to Generate Random Numbers in SPSS appeared first on EZ SPSS Tutorials.

]]>As a starting point, you should at least have an ID variable populated in the Data View of SPSS.

The ID variable functions to identify the number of cases in a data set for which SPSS will generate random numbers.

To generate a set of random numbers, we’re going to use SPSS’s Compute Variable dialog box.

Click on Transform -> Compute Variable.

You need to do a number of things to set up this dialog box so SPSS will generate random numbers.

First, name your target variable. We’ve called ours RandomNumbers. This is the variable that SPSS will create to hold the set of random numbers.

Once you’ve named your target variable, select Random Numbers in the Function group on the right. This will bring up a set of functions, all of which operate to generate different kinds of random numbers.

The function we need is called Rv.Uniform. This returns a random value from a uniform distribution with a specified minimum and maximum value. Or, to put it a different way, it will generate a random number between two limits, where every possible value between the limits is equally likely to be generated.

It’s necessary to get the Rv.Uniform function into the Numeric Expression box at the top of the dialog box. You can drag and drop (as above) or use the up arrow in the middle of the dialog.

After you drag the RV.Uniform function into the Numeric Expression box, you’ll notice it has two question marks after it (see above). This signals that you need to specify minimum and maximum values for your random numbers.

This is easy to do. Just replace each question mark with a value. We’ve chosen 0 as our minimum and 100 as our maximum (as above).

That completes the set up. Just hit OK to generate the variable containing the set of random numbers (in this case between 0 and 100).

As you can see below, SPSS has created a new variable called RandomNumbers, and filled it with random numbers, each with a value between 0 and 100.

One thing to note here is that although you’re seeing only 2 decimal places, SPSS has actually calculated the numbers with much more precision (which you’ll see if you select an individual cell). This means it’s very unlikely you’ll get a duplicate number.

Consider the following scenario. You’ve recruited thirty people for a medical study. You want to allocate these people to treatment and control conditions on a random basis. How do you go about it using SPSS?

The following method will work.

Fire up the Compute Variable dialog box again (Transform -> Compute Variable).

This time we’re going to combine two functions together to allocate people to a treatment and control condition (where control means getting the placebo).

As before, the first thing to do is to name our target variable. We’ve chosen TreatmentGroup as our name.

Once you’ve named your target variable, select Arithmetic in the Function group on the left, and then scroll down until you get to the Trunc(1) function.

This function has the effect of rounding any decimal number down towards zero. Or, to put this another way, it truncates a decimal so you’re left with just the integer part of the number. For example, 2.91 will become 2 and 3.33 will become 3.

As before, you’ve got to get this function up into the Numeric Expression box, which you can do by dragging and dropping.

You’ll have noticed there’s a question mark immediately following the Trunc function in the Numeric Expression box (see above). This is a placeholder for the value that’ll be truncated.

We’re going to truncate a random number that lies between 0 and 2. The reason why will become apparent shortly.

To do this, we’re using the same RV.Uniform function as we used before. This time let’s just type it in, but with 0 and 2 as the Min and Max values. It should replace the question mark that appears between the brackets at the end of the Trunc function in the Numeric Expression box (as below).

To recap, the RV.UNIFORM(0,2) function is going to create a set of random numbers between 0 and 2. The truncate function will strip away the decimal part of each number just leaving the integer part.

Let’s press OK, and see how that turns out.

As you can see below, SPSS has created a new variable called TreatmentGroup, and in every case the value is either 0 or 1. (It’s theoretically possible to get a 2, but very unlikely.) This is because the truncate function has rounded every randomly generated value between 0 and 1 down to 0, and every randomly generated value between 1 and 2 down to 1.

In this context, 1 means the treatment condition and 0 the control condition, so we’ve achieved our goal of randomly allocating people to treatment and control conditions. However, you might want to tidy things up a little by going into the Variable View and setting up value labels (1 = Treatment, 0 = Control).

SPSS allows you to generate random numbers that are drawn from a normal distribution with a specified mean and standard deviation. This functionality will often be useful for various sorts of computer simulation.

Imagine you want to run a population level simulation of the effectiveness of different treatment options for a particular disease, and you know that drug efficacy is affected by patient weight. In this situation, you’re going to want your population model to reflect the distribution of weights of people in the real world. We already know weight is normally distributed, which means so long as we know the mean and standard deviation of the distribution, we can create a random distribution of weights in SPSS that will match the characteristics of the distribution of real world weights.

This is how we’d do this for adult males, assuming that the mean weight of an adult male is 195 lbs and the standard deviation of the distribution of weights is 35 lbs.

Navigate to the Compute Variable dialog box again (Transform -> Compute Variable). Hit reset if you need to return it to its default state. You should by now be familiar with the next several steps.

First, name your Target Variable (we’ve got Weight as our variable name).

Second, choose Random Numbers in the Function group, and within the Functions and Special Variables text box, scroll down until you get to RV.Normal.

Third, drag RV.Normal up into the Numeric Expression text box.

Fourth, replace the first question mark with 195, which is the mean weight, and the second question mark with 35, which is the standard deviation.

And that’s it. Press OK, and SPSS will create a variable called Weight, and fill it with normally distributed weights.

As you can see below, we now have our distribution of weights.

***************

Right, that’s it for this tutorial. You should now have an idea of how to generate random numbers within SPSS, and how you can leverage this functionality to solve various sorts of problems.

The post How to Generate Random Numbers in SPSS appeared first on EZ SPSS Tutorials.

]]>