SAS Procedures for Common Statistical Analyses
Contents:

Introduction/Data Set Up

Describing Quantitative Variables

Describing Qualitative Variables

TwoSample Tests (Independent Samples)

Completely Randomized Design (1Way ANOVA)

Randomized Block Design

2Factor ANOVA

ChiSquare Tests

Linear Regression

Correlation

Generalized Linear Models

Logistic Regression

Poisson Regression

Negative Binomial Regression

Introduction/Data SetUp
For all descriptions, we will have datasets where each line represents an individual case, and there are 3 quantitative variables: X, Y, Z measured; and 2 qualtative variables: A, B given, unless otherwise noted.
DATA ONE;
INPUT X Y Z A B;
CARDS;
Data Here
;
RUN;
NOTE: All procedures can be done separately for all levels of one or more factors, and specifically for only cases that meet some criteria.
Analysis Conducted separately for all levels of Factor A:
Data step
RUN;
PROC SORT; BY A; RUN;
PROC PROCNAME;
BY A;
RUN;
Analysis Conducted only on cases where (say) A=1:
Data step
RUN;
PROC PROCNAME;
WHERE A=1;
Other PROC Statements
RUN;

Describing Quantitative Variables
Dataset contains 3 quantitative variables: X,Y,Z 
2 qualitative Factors: A,B
Basic Statistics: PROC MEANS
For all cases:
PROC MEANS;
VAR X Y Z;
RUN;
For cases separately by Factor A:
PROC MEANS;
CLASS A;
VAR X Y Z;
RUN;
For cases separately by combinations of Factors A & B:
PROC MEANS;
CLASS A B;
VAR X Y Z;
RUN;
Fullblown Summary: PROC UNIVARIATE
Default: Moments, SS, CV, SEM, Median, IQR, Tests for Location (=0: ttest, Median=0: Sign, SignedRank tests), Quantiles, Extreme Observations
PROC UNIVARIATE;
VAR X Y Z;
RUN;

Describing Qualitative Variables
Note: Dataset need not contain quantitive variables X, Y, Z; but does contain qualitative responses A and B.
Frequency Tabulation for a Single Qualitive Response (A):
PROC FREQ; TABLES A; RUN;
Frequency CrossTabulation for Pair of Qualitive Responses (A,B):
PROC FREQ; TABLES A*B; RUN;
NOTE: In many instances you may wish to reproduce and further analyze data previously published in a contingency table. Then each “case” is a cell in the table, and you will include a count for each cell.
DATA ONE;
INPUT A B NUMCASE;
CARDS;
1 1 25
1 2 32
2 1 17
2 2 42
;
RUN;
PROC FREQ; TABLES A*B; WEIGHT NUMCASE; RUN;

2Sample tests (Independent Samples)
For this case, assume Factor A has 2 levels, and X is our response variable.
TTEST Procedure: H_{0}: _{1}_{2} = 0 versus H_{A}: _{1}_{2} 0
The procedure will conduct the ttest based on the assumptions of equal and unequal variances, as well as the Ftest for equal variances to guide you to which analysis to use.
PROC TTEST;
CLASS A;
VAR X;
RUN;
NPAR1WAY Procedure: H_{0}: M_{1}M_{2} = 0 versus H_{A}: M_{1}M_{2} 0
PROC NPAR1WAY WILCOXON;
CLASS A;
VAR X;
RUN;

Completely Randomized Design (1Way ANOVA)
Statistical Model: Y = _{i} + _{ij} = _{i} + _{ij} i=1,…,a j=1,…,n_{i}
Let Factor A represent the treatment factor and Y be the response variable. The dataset AOVOUT will contain the original dataset and residuals (with variable name E).
ANOVA Ftest, Levene’s Test for Equal Variance and Bonferroni/Tukey Comparisons
PROC GLM;
CLASS A;
MODEL Y = A;
MEANS A / BON TUKEY HOVTEST;
OUTPUT OUT=AOVOUT R=E;
RUN;
KruskalWallis HTest (Nonparametric)
PROC NPAR1WAY WILCOXON;
CLASS A;
VAR Y;
RUN;

Randomized Block Design
Statistical Model: Y = _{i} + b_{j} + _{ij} = _{i} +b_{j} + _{ij} i=1,…,a j=1,…,b
Let A represent the treatment factor, B represent the blocking factor, and Y be the response variable. The dataset AOVOUT will contain the original dataset and residuals (with variable name E).
ANOVA Ftest and Bonferroni/Tukey Comparisons
PROC GLM;
CLASS A B;
MODEL Y = A B;
MEANS A / BON TUKEY;
OUTPUT OUT=AOVOUT R=E;
RUN;
Friedman’s Test (Nonparametric)
PROC FREQ;
TABLES B*A*Y / CMH2 SCORES=RANK NOPRINT;
RUN;
Statistic and PValue are printed by “Row Mean Scores Differ”

2Factor ANOVA
Statistical Model:
Y = _{i} + _{j} +()_{ij} + _{ijk} i=1,…,a j=1,…,b k=1,…,n
The dataset AOVOUT will contain the original dataset and residuals (with variable name E).
Additive Model – No Interaction
PROC GLM;
CLASS A B;
MODEL Y = A B;
MEANS A B / BON TUKEY;
OUTPUT OUT=AOVOUT R=E;
RUN;
Model With Interaction
PROC GLM;
CLASS A B;
MODEL Y = A B A*B;
MEANS A B / BON TUKEY;
OUTPUT OUT=AOVOUT R=E;
RUN;

ChiSquare Test
Cases are classified on two qualitative variables: A and B
Want to test whether the classifications are independent (or that the conditional distribution of variable B is the same for every level of A).
PROC FREQ;
TABLES A*B / CHISQ EXPECTED;
RUN;
When measures of association (and tests of significance) are desired instead of the ChiSquare test, use:
PROC FREQ;
TABLES A*B / MEASURES;
RUN;

Linear Regression
Simple Linear Regression
Statistical Model: Y_{i} = _{0 }+ _{1}X_{i }+ _{i} i=1,…,n
The dataset REGOUT will contain the original dataset and residuals (with variable name E).
PROC REG;
MODEL Y = X;
OUTPUT OUT=REGOUT R=E;
RUN;
Multiple Linear Regression (Dataset contains variables X1,…,Xk)
Statistical Model: Y_{i} = _{0} + _{1}X_{1i} +…+ _{k}X_{ki}_{i} i=1,…,n
PROC REG;
MODEL Y = X1 X2 … Xk;
OUTPUT OUT=REGOUT R=E;
RUN;

Correlation
Data: Variables Y1,…,Yk Pairwise Bivariate Correlations
PROC CORR; VAR Y1 … Yk; RUN;
Partial Correlation between Y and Z, Controlling for X
PROC CORR; VAR Y Z; PARTIAL X; RUN:

Generalized Linear Models
Logistic Regression
Statistical Model: Y is a binary outcome:
PROC GENMOD;
MODEL Y = X / DIST=BIN LINK=LOGIT;
RUN;
Poisson Regression
Statistical Model: Y is a count outcome:
Y_{i} ~ Poisson(_{i}) log(_{i}) = _{0} + _{1}X_{I} E(Y_{i}) = (_{i}) V(Y_{i}) = _{i}
PROC GENMOD;
MODEL Y = X / DIST=POI LINK=LOG;
RUN;

Statistical Model: Y is a count outcome:
Y_{i} ~ NB(_{i},k) log(_{i}) = _{0} + _{1}X_{i} E(Y_{i}) = (_{i}) V(Y_{i}) = _{i} + (_{i}^{2}/k)
PROC GENMOD;
MODEL Y = X / DIST=NB LINK=LOG;
RUN; 