Social Science Technology ServiceWestern Social Science

Frequently Asked Questions

General

Both variables A and B are tested to be normally distributed. Why is A-B not normally distributed?

What is the difference between a Chi-square test and Fisher’s exact test?

How do you interpret the p-value of a Chi-square test or Fisher’s exact test?

SAS

How can I test if the residuals are normally distributed?

How can I test for homogeneity of variance?

SPSS

How do you display the syntax commands in the output viewer?

How can I carry out a Yates’ chi-square test?

How can I add a regression line to a scatter plot?

How do you get the overall correlation between two sets of variables?

How do I carry out a reliability analysis where my model uses Cronbach's alpha?


Both variables A and B are tested to be normally distributed. Why is A-B not normally distributed?

A linear combination of two normally distributed variables is not necessarily normally distributed unless they are independently distributed.


What is the difference between a Chi-square test and Fisher’s exact test?

Both test the association between two categorical variables. The difference is that the Chi-square test requires the expected cell counts in the crosstabulation of these two categorical variables to be larger than 5. When this assumption fails Fisher's exact test is recommended.


How do you interpret the p-value of a Chi-square test or Fisher’s exact test?

We begin by assuming there is no association between the two (categorical) variables. In technical terms this is called the null hypothesis. The alternative hypothesis would state the two variables are associated is some way.

The p-value of a Chi-square test or Fisher’s exact test tells us the likelihood of getting more extreme results than what we got. If our assumption is correct then a p-value of 0.01 would suggest the chance of getting more extreme results than we currently got is very small. In this case we have evidence to suggest our assumption of no association is not correct. Hence it would be reasonable to claim there is an association between the two variables.

What we usually do is compare the p-value with some pre-specified (a-priori) value which we call ∝ (the significance level). If p is less than ∝, we reject the null hypothesis and accept the alternative hypothesis.

Of course we could be wrong! It is possible to get very extreme results by chance even though two variables are not associated at all. This is called a Type I error. The probability of making a Type I error is less than ∝. Loosely speaking, the smaller the value of p, the stronger is the evidence to claim a significant association.

Common testing significance levels are 0.01, 0.05, and 0.10.


How can I test if the residuals are normally distributed?

In the following example residuals (variable RESID) from a regression model are output to the dataset RESDAT via the OUTPUT statement in the GLM procedure.

To test for normality use the NORMAL option in the UNIVARIATE procedure. Note the residuals are read from the RESDAT dataset via the DATA option in this procedure.

proc glm;
  model Y=X;
  output out=RESDAT r=RESID;
proc univariate data=RESDAT normal;
  var RESID;
run;

If the p-value for the normality test is greater than 0.05 (your a priori probability) you may consider the residuals to be normally distributed.


How can I test for homogeneity of variance?

Levene's test is widely considered to be the standard homogeneity of variance test. Use the GLM procedure and specify the HOVTEST option in the MEANS statement. In this example we test if the variances of A in each level of GROUP are equal.

proc glm;
  class GROUP;
  model A=GROUP;
  means GROUP /hovtest;
run;

If p > 0.05 then equal variances may be assumed. Note that the GLM procedure allows homogeneity of variance testing for simple one-way models only. Homogeneity of variance testing for more complex models is a subject of current research.


How do you display the syntax commands in the output viewer?

Edit > Options > Viewer > Display commands in the log


How can I carry out a Yates’ chi-square test?

Note that Yates’ corrected chi-square (continuity correction) may only be calculated for tables with two rows and two columns.

Analyze > Descriptive Statistics > Crosstabs > Statistics > Chi-square

In the output viewer Yates’ corrected chi-square is found in the Chi-square Tests table on the line labeled Continuity Correction.


How can I add a regression line to a scatter plot?

In the output viewer, double-click the scatter plot to bring it into the chart editor. Choose Options from the Chart menu. Click on the box beside Total if it is not already checked. Click the Fit Options button and choose the Fit Method Linear regression.


How do you get the overall correlation between two sets of variables?

Create 2 syntax documents and save them in the same location. One of the documents contains a macro calculating the correlations. The other document employs this macro via the INCLUDE command.

Suppose you have two sets of variables. One set comprises variables A and B. The other set contains variables C and D.

Run the following syntax

include 'CanCorrRoutine.sps'.
CANCORR
  set1=A B
 /set2=C D.

which includes the syntax document CanCorrRoutine.sps defining the macro CANCORR.


How do I carry out a reliability analysis where my model uses Cronbach's alpha?

Analyze > Scale > Reliability Analysis

and select Alpha as the model.