[SAS] Proc GLM Explained

I found this very detailed explanation of Proc GLM in SAS by Julio Peixoto from the Boeing company. The article is posted on the University of Houston website. Since Mr. Pexioto hasn’t worked in UH since 2003, I am afraid one day this nice article will get deleted. So I copied it to my blog.

The original article is at: http://www.bauer.uh.edu/peixoto/SAS/sasGLM.HTM

The article shows all the statements of the procedure and the statements are clickable and linked to the detailed explanation.

 

PROC GLM analyzes data within the framework of General Linear Models, hence the name GLM. GLM handles classification variables, which have discrete levels, as well as continuous variables, which measure quantities. Thus GLM can be used for many different analyses including:

  • simple regression
  • multiple regression
  • analysis of variance (ANOVA), especially for unbalanced data
  • analysis of covariance
  • response-surface models
  • weighted regression
  • polynomial regression
  • partial correlation
  • multivariate analysis of variance (MANOVA)
  • repeated measures analysis of variance.

PROC GLM options ; 
CLASS variable-list; 
MODEL dependents= independents / options; /* required */ 
ABSORB variable-list; 
BY variable-list; 
FREQ variable; 
ID variable-list; 
WEIGHT variable; 
CONTRAST 'label' effect values... / options; 
ESTIMATE 'label' effect values... / options; 
LSMEANS effects / options; 
MANOVA H= effects E= effect M= equations... MNAMES= names PREFIX= name / options; 
MEANS effects / options; 
OUTPUT OUT= SAS-data-set keywords= names... ; 
RANDOM effects / options; 
REPEATED factorname levels (levelvalues) transformation<,...> / options; 
TEST H= effects E= effect / options;

PROC GLM options;

The following options can be used in the PROC GLM statement:

DATA= SAS-data-set
  • The DATA= option names the SAS data set to be used by GLM.
  • If the DATA= option is omitted, GLM uses the most recently created SAS data set .
ORDER= FREQ|DATA|INTERNAL|FORMATTED
  • The ORDER= option specifies the order in which you want the levels of the classification variables (specified in the CLASS statement) to be sorted.
  • If you specify ORDER= FREQ , levels are sorted by descending frequency count. If you specify ORDER= DATA, levels are sorted in the order in which they first occur in the input data.
  • If you specify ORDER= INTERNAL , then the levels are sorted by the internal value. If you specify ORDER= FORMATTED, levels are ordered by the external formatted value.  Default: FORMATTED
OUTSTAT= SAS-data-set
  • The OUTSTAT= option names an output SAS data set that will contain sums of squares, F statistics, and probability levels for each effect in the model, as well as for each CONTRAST statement used.
MANOVA
  • The MANOVA option requests that PROC GLM use the multivariate mode of eliminating observations with missing values, that is, to eliminate an observation from the analysis if any of the dependent variables have missing values.
MULTIPASS
  • The MULTIPASS option requests that PROC GLM reread the input data set when necessary, instead of writing the necessary values of dependent variables to a utility file.
NOPRINT
  • The NOPRINT option suppresses the normal printout of results. When the NOPRINT is used, no printed output is produced.

CLASS Statement

CLASS variable-list;
  • The CLASS statement names the classification variables to be used in the analysis. If the CLASS statement is used, it must appear before the MODEL statement.
  • Classification variables can be either character or numeric. Only the first sixteen characters of a character variable are used.
  • Class levels are determined from the formatted values of the CLASS variables.

MODEL Statement

MODEL dependents= independents / options;
  • The MODEL statement names the dependent variables and independent effects.
  • If no independent effects are specified, only an intercept term is fit. These options can be specified in the MODEL statement after a slash (/):
   NOINT          INTERCEPT      NOUNI          SOLUTION     TOLERANCE
   E              E1             E2             E3           E4
   SS1            SS2            SS3            SS4          ALPHA= p
   CLM            CLI            P              XPX          INVERSE
   SINGULAR= value               ZETA= value    *|@||

 

 MODEL dependents= independents / NOINT ;
  • The NOINT option of the MODEL statement requests that the intercept parameter not be included in the model.
MODEL dependents= independents / INTERCEPT ;
  • The INTERCEPT or INT option of the MODEL statement requests that GLM print the hypothesis tests associated with the intercept as an effect in the model.
  • By default, the intercept is included in the model, but no tests of hypotheses associated with it are printed . When the INT option is specified, these tests are printed.
MODEL dependents= independents / NOUNI ;
  • The NOUNI option of the MODEL statement requests that no univariate statistics be printed.
MODEL dependents= independents / SOLUTION ;
  • The SOLUTION option of the MODEL statement requests that GLM print a solution to the normal equations (parameter estimates). The procedure always prints a solution when no CLASS statement appears.
MODEL dependents= independents / TOLERANCE ;
  • The TOLERANCE option of the MODEL statement requests that the tolerances used in the SWEEP routine be printed.
MODEL dependents= independents / E ;
  • The E option of the MODEL statement requests that the general form of all estimable functions be printed.
MODEL dependents= independents / E1 ;
  • The E1 option of the MODEL statement requests that the Type I estimable functions for each effect in the model be printed.
MODEL dependents= independents / E2 ;
  • The E2 option of the MODEL statement requests that the Type II estimable functions for each effect in the model be printed.
MODEL dependents= independents / E3 ;
  • The E3 option of the MODEL statement requests that the Type III estimable functions for each effect in the model be printed.
MODEL dependents= independents / E4 ;
  • The E4 option of the MODEL statement requests that the Type IV estimable functions for each effect in the model be printed.
MODEL dependents= independents / SS1 ;
  • The SS1 option of the MODEL statement requests that the Sum of Squares (SS) associated with Type I estimable functions for each effect be printed.
MODEL dependents= independents / SS2 ;
  • The SS2 option of the MODEL statement requests that the Sum of Squares (SS) associated with Type II estimable functions for each effect be printed.
MODEL dependents= independents / SS3 ;
  • The SS3 option of the MODEL statement requests that the Sum of Squares (SS) associated with Type III estimable functions for each effect be printed.
MODEL dependents= independents / SS4 ;
  • The SS4 option of the MODEL statement requests that the Sum of Squares (SS) associated with Type IV estimable functions for each effect be printed.
MODEL dependents= independents / ALPHA= p ;
  • The ALPHA= option of the MODEL statement specifies the alpha level for confidence intervals. The only acceptable values for ALPHA are 0.01, 0.05, and 0.10. If no ALPHA level is given, GLM uses 0.05.
MODEL dependents= independents / CLM ;
  • The CLM option of the MODEL statement prints confidence limits for a mean predicted value for each observation.
MODEL dependents= independents / CLI ;
  • The CLI option of the MODEL statement prints confidence limits for individual predicted values for each observation.
  • The CLI and CLM options should not be used together; CLI is ignored if CLM is also specified.
MODEL dependents= independents / P ;
  • The P option of the MODEL statement prints observed, predicted, and residual values for each observation that does not contain missing values for independent variables.
  • The Durbin-Watson statistic is also printed when P is specified. The PRESS statistic is also printed if either CLM or CLI is specified.
MODEL dependents= independents / XPX ;
  • The XPX option of the MODEL statement prints the X’X crossproducts matrix.
MODEL dependents= independents / INVERSE ;
  • The INVERSE or I option of the MODEL statement prints the inverse or the generalized inverse of the X’X matrix.
MODEL dependents= independents / SINGULAR= value ;
  • The SINGULAR= option of the MODEL statement tunes the sensitivity of the regression routine to linear dependencies in the design.
  • Note that the default value of SINGULAR, 1E-7 , may be too small, but this value is necessary in order to handle the high-degree polynomials used in the literature to compare regression routines.
MODEL dependents= independents / ZETA= value ;
  • The ZETA= option of the MODEL statement tunes the sensitivity of the check for estimability for Type III and Type IV functions.
  • Any element in the estimable function basis with an absolute value less than ZETA is set to zero. Default: 1E-8

*|@|| Operators

  • When the bar ( | ) is used, the right- and left- hand sides becomes effects, and the cross of them becomes an effect.
  • Multiple bars are permitted. A | B | C
  • You can also specify the maximum number of variables involved in any effect that results from bar evaluation by specifying that maximum number, preceded by an @ sign, at the end of the bar effect. A | B(A) | C@2
  • Crossed effects (interactions) are specified by joining class variables with asterisk: A*B B*C A*B*C

ABSORB Statement

ABSORB variable-list;
  • For a main effect variable that does not participate in interactions, you can absorb the effect by naming it in an ABSORB statement. This means that the effect can be adjusted out before the construction and solution of the rest of the model.
  • When the ABSORB statement is used, the data set (or each BY group) must be sorted by the variables in the ABSORB statement.
  • GLM cannot produce predicted values or create an output data set of diagnostic values if ABSORB is used.
  • If the ABSORB statement is used, it must appear before the first RUN statement or it is ignored.

BY Statement

BY <DESCENDING> variables ... <NOTSORTED>;
  • A BY statement is used with a procedure to obtain separate analyses on observations in groups defined by the BY variables. The data set being processed need not have been previously sorted by the SORT procedure. However, the data set must be in the same order as though PROC SORT had sorted it unless NOTSORTED is specified.
  • If you have used a FORMAT or ATTRIB statement to group a continuous variable into discrete groups, the BY statement creates BY groups based on the formatted values.
  • You can also ensure that variables are processed in ascending order by creating an index for one or more variables in the SAS data set. The usages of the BY statement differ in each procedure. Please refer to the Users’ Guide for the details.

FREQ Statement

FREQ variable;
  • When a FREQ statement appears, each observation in the input data set is assumed to represent n observations in the experiment. For each observation, n is the value of the variable specified in the FREQ statement.
  • If the value of the FREQ statement variable is missing or is less than 1, the observation is not used in the analysis. If the value is not an integer, only the integer portion is used.
  • If the FREQ statement is used, it must appear before the first RUN statement or it is ignored.

ID Statement

ID variable-list;
  • When predicted values are requested as a MODEL statement option, values of the variables given in the ID statement are printed beside each observed, predicted, and residual value for identification.
  • Although there are no restrictions on the number or length of ID variables, GLM may truncate the number of values printed in order to print on one line.
  • If the ID statement is used, it must appear before the first RUN statement or it is ignored.

WEIGHT Statement

WEIGHT variable;
  • When a WEIGHT statement is used, a weighted residual sum of squares is minimized. The observation is used in the analysis only if the value of the WEIGHT statement variable is greater than zero.
  • The WEIGHT statement has no effect on degrees of freedom or number of observations, but is used by the MEANS statement when calculating means and performing multiple range tests.
  • If the WEIGHT statement is used, it must appear before the first RUN statement or it is ignored.

CONTRAST Statement

CONTRAST 'label' effect values... / options;
  • The CONTRAST statement provides a mechanism for obtaining custom hypothesis tests. There is no limit to the number of CONTRAST statements, but they must come after the MODEL statement. In the CONTRAST statement,
label
is twenty characters or less and is used on the printout to identify the contrast.
effect
is the name of an effect that appears in the MODEL statement.
values
are constants that are elements of the L vector associated with the effect.

These options can appear in the CONTRAST statement after a slash (/):

E E= effect ETYPE= n SINGULAR= number
  • The E option of the CONTRAST statement requests that the entire L vector be printed.
  • The E= option of the CONTRAST statement specifies an effect in the model to use as an error term. If none is specified, the error MS is used.
    • If you specify an effect, it is used as the denominator in F tests in univariate analysis.
    • If you specify an effect, and a MANOVA or REPEATED statement is also present, the effect is used as the basis of the E matrix.
  • The ETYPE= option of the CONTRAST statement specifies the type (1,2,3, or 4) of the E= effect . If E= is specified and ETYPE= is not, the highest type computed in the analysis is used.
  • The SINGULAR= option of the CONTRAST statement tunes the estimability checking. Default: 1E-4

ESTIMATE Statement

ESTIMATE 'label' effect values... / options;
  • The ESTIMATE statement can be used to estimate linear functions of the parameters by multiplying the vector L by the parameter estimate vector b resulting in Lb. There is no limit to the number of ESTIMATE statements, but they must come after the MODEL statement. In the ESTIMATE statement,
label

is twenty characters or less and is used on the printout to identify the estimate.

effect

is the name of an effect that appears in the MODEL statement.

values

are constants that are elements of the L vector associated with the preceding effect.

These options can appear in the ESTIMATE statement after a slash (/):

E DIVISOR= number SINGULAR= number
  • The E option of the ESTIMATE statement requests that the entire L vector be printed.
  • The DIVISOR= option of the ESTIMATE statement specifies a value by which to divide all coefficients so that fractional coefficients can be entered as integer numerators.
  • The SINGULAR= option of the ESTIMATE statement tunes the estimability checking. Default: 1E-4

LSMEANS Statement

LSMEANS effects / options;
  • Least-squares means are computed for each effect listed in the LSMEANS statement. Any number of LSMEANS statements can be used. They must be given after the MODEL statement.

These options can appear in the LSMEANS statement after a slash (/):

ADJUST= ALPHA=
AT BYLEVEL
CL COV
E E= effect
ETYPE= NOPRINT
OBSMARGINS OUT=
PDIFF SINGULAR=
SLICE= STDERR
TDIFF
  • The ADJUST= option requests a multiple comparison adjustment for the p-values and confidence limits for the differences of least squares means. The available keywords are BON, DUNNETT, SCHEFFE, SIDAK, SIMULATE, SMM | GT2, TUKEY, and T. The default is ADJUST=T , which specifies no adjustment for multiple comparisons, but note that the default value may change if you specify the PDIFF option.
  • ALPHA= value
    • The ALPHA= option specifies the level of significance for confidence intervals. It is applicable only if you specify the CL option, and, optionally, the PDIFF option. The specified value must be between 0 and 1, and the default is 0.05 , corresponding to 95% confidence intervals.
  • AT variable=value AT (variable-list)=(value-list) AT MEANS
    • The AT option allows you to modify the values of the covariates used in computing least squares means. By default, all covariate effects are set equal to their mean values for computation of standard least squares means. For effects containing two or more covariates, the AT option sets the effect equal to the product of the individual means rather than the mean of the product.
    • The AT MEANS option sets covariates equal to their mean values and incorporates this adjustment to cross products of covariates.
    • The AT option is disabled if you specify the BYLEVEL option.
  • The BYLEVEL option requests that the OM data set be processed by each level of the least squares mean effect in question.
  • The CL option requests that confidence limits be constructed for each of the least-squares means. The confidence level is 0.95 by default; this can be changed with the ALPHA= option.
  • LSMEANS effects / COV ;
    • The COV option of the LSMEANS statement requests that covariances be included in the output data set specified in the OUT= option of the LSMEANS statement.
    • If no OUT= option is specified in the LSMEANS statement, the COV option has no effect.
    • When you specify the COV option, you can specify only one effect in the LSMEANS statement.
  • LSMEANS effects / E ;
    • The E option of the LSMEANS statement prints the estimable functions used to compute the least-squares means.
  • LSMEANS effects / E= effect ;
    • The E= option of the LSMEANS statement specifies an effect in the model to use as an error term.
    • If neither STDERR nor PDIFF nor TDIFF is specified, the E= option is ignored. If STDERR, PDIFF, or TDIFF is specified and E= is not, the error MS is used for calculating standard errors and probabilities.
  • LSMEANS effects / ETYPE= n ;
    • The ETYPE= option of the LSMEANS statement specifies the type (1,2,3, or 4) of the E= effect . If E= is specified and ETYPE= is not, the highest type computed in the analysis is used.
  • LSMEANS effects / NOPRINT ;
    • The NOPRINT option of the LSMEANS statement requests that the normal printed output from the LSMEANS statement be suppressed. This option is useful when an output data set is requested using the OUT= option of the LSMEANS statement.
  • OBSMARGINS OM
    • The OM option specifies a potentially different weighting scheme for the computation of least squares means coefficients. The standard least squares means have equal coefficients across classification effects, but the OM option changes these coefficients to be proportional to those found in the input. This adjustment is reasonable when you want your inferences to apply to a population that is not necessarily balanced but has the margins observed in the original data set.
  • LSMEANS effects / OUT= SAS-data-set ;
    • The OUT= option of the LSMEANS statement specifies the name of an output data set to contain the values, standard errors, and, optionally, the covariances of the least-squares means.
  • LSMEANS effects / PDIFF < =difftype > ;
    • The PDIFF option requests that p-values for differences of the least squares means be printed. The optional difftype specifies which differences to print, with possible values of ALL, CONTROL, CONTROLL, and CONTROLU.
    • ALL requests all pairwise differences, and it is the default. CONTROL requests the differences with a control, which by default is the first level of each of the specified LSMEAN effects. To specify which levels of the effects are the controls, enclose the formatted values in quotes and in parentheses after the CONTROL keyword. Two-sided tests are associated with the CONTROL difftype. CONTROLL and CONTROLU test whether the noncontrol levels are significantly smaller than or larger than the contol, respectively. Note that the default value of the ADJUST= option depends upon the specified difftype. Refer to the documentation for details.
  • LSMEANS effects / SINGULAR= number ;
    • The SINGULAR= option of the LSMEANS statement tunes the estimability checking. Default: 1E-4
  • SLICE= fixed-effect SLICE= (fixed-effects)
    • The SLICE= option specifies the effects by which to divide interaction LSMEAN effects. This can produce what are known as tests of simple effects.
  • LSMEANS effects / STDERR ;
    • The STDERR option of the LSMEANS statement prints the standard error of the LSM and the probability level for the hypothesis which tests LSM= 0.
  • LSMEANS effects / TDIFF ;
    • The TDIFF option of the LSMEANS statement requests that the t values for the hypotheses which test LSM(i)= LSM(j) be printed along with the corresponding probabilities.

MANOVA Statement

MANOVA H= effects E= effect M= equations... MNAMES= names PREFIX= name / options ;
  • If the MODEL statement includes more than one dependent variable, additional multivariate statistics can be requested with the MANOVA statement.
  • When a MANOVA statement appears before the first RUN statement, GLM enters a multivariate mode with respect to the handling of missing values; observations with missing independent or dependent variables are excluded from the analysis.
  • If you use a CONTRAST statement with a MANOVA statement, the CONTRAST statement must appear before the MANOVA statement.
  • The H= option the MANOVA statement specifies effects in the preceding model to use as hypothesis matrices.
  • The E= option of the MANOVA statement specifies the error effect. If you omit the E= specification,the error SSCP (residual) matrix from the analysis is used.
  • The M= option of the MANOVA statement specifies a transformation matrix and can be written in either of these two forms:
M= equation1,equation2,...
M= (listofnumbers,...)
    • When you include an M= specification, the analysis requested in the MANOVA statement is carried out for the variables defined by the equations in the specification, not the original dependent variables.
    • If M= is omitted, the analysis is performed for the original dependent variables in the MODEL statement.
  • The MNAMES= option of the MANOVA statement provides names for the variables defined by the equations in the M= specification. Names in the list correspond to the M= equations or the rows of the M matrix (as it is entered).
  • The PREFIX= option of the MANOVA statement is an alternative means of identifying the transformed variables defined by the M= option.
    • For example, if you specify PREFIX=DIFF, the transformed variables are labeled DIFF1, DIFF2, and so forth.

The following options can appear in the MANOVA statement after a slash (/):

CANONICAL
ETYPE= n
HTYPE= n
ORTH
PRINTE
PRINTH
SUMMARY
  • The CANONICAL option of the MANOVA statement requests that a canonical analysis of the H and E matrices (transformed by the M matrix, if specified) be printed instead of the default printout of characteristic roots and vectors.
  • The ETYPE= option of the MANOVA statement specifies the type (1,2,3, or 4) of the E matrix.
    • If no ETYPE= option appears in the MANOVA statement, the ETYPE= value defaults to the highest type (largest n) used in the analysis.
  • The HTYPE= option of the MANOVA statement specifies the type (1,2,3, or 4) of the H matrix.
  • The ORTH option of the MANOVA statement requests that the transformation matrix in the M= specification of the MANOVA statement be orthonormalized by rows before the analysis.
  • The PRINTE option of the MANOVA statement requests printing of the E matrix.
    • If the E matrix is the error SSCP (residual) matrix from the analysis, the partial correlations of the dependent variables given the independent variables are also printed.
  • The PRINTH option of the MANOVA statement requests that the H matrix (the SSCP matrix) associated with each effect specified by the H= specification be printed.
  • The SUMMARY option of the MANOVA statement produces analysis-of- variance tables for each dependent variable.

MEANS Statement

MEANS effects / options;
  • For any effect that appears on the right-hand side of the model and that does not contain any continuous variables, GLM can compute means of all continuous variables in the model.
  • You can use any number of MEANS statements, provided they appear after the MODEL statement.

These options can appear in the MEANS statement after a slash (/):

ALPHA=
BON
CLDIFF
CLM
DEPONLY
DUNCAN
DUNNETT
DUNNETTL
DUNNETTU
E=
ETYPE=
GABRIEL
HOVTEST=
HTYPE=
KRATIO=
LINES
NOSORT
REGWF
REGWQ
SCHEFFE
SIDAK
SMM | GT2
SNK
T | LSD
TUKEY
WALLER
WELCH
  • MEANS effects / ALPHA= p ;
    • The ALPHA= option of the MEANS statement gives the level of significance for comparisons among the means. The default ALPHA= value is 0.05.
    • With the DUNCAN option, you may only specify values of 0.01, 0.05, or 0.1. For other options, you may use values between 0.0001 and 0.9999.
  • MEANS effects / BON ;
    • The BON option of the MEANS statement performs Bonferroni t tests of differences between means for all main effect means in the MEANS statement.
  • MEANS effects / CLDIFF ;
    • The CLDIFF option of the MEANS statement requests that the results of the following options be presented as confidence intervals for all pairwise differences between means: BON GABRIEL SCHEFFE SIDAK SMM GT2 T LSD TUKEY .
    • CLDIFF is the default for unequal cell sizes unless any of these options are specified: DUNCAN REGWF REGWQ SNK WALLER.
  • MEANS effects / CLM ;
    • The CLM option of the MEANS statement requests that the results of the following options be presented as confidence intervals for the mean of each level of the variables specified in the MEANS statement: BON GABRIEL SCHEFFE SIDAK SMM T LSD .
  • MEANS effects / DEPONLY ;
    • The DEPONLY option of the MEANS statement indicates that only the dependent variable means are to be printed. By default, GLM prints means for all continuous variables, including independent variables.
  • MEANS effects / DUNCAN ;
    • The DUNCAN option of the MEANS statement performs Duncan’s multiple-range test on all main effect means given in the MEANS statement.
  • MEANS effects / DUNNETT <(formattedcontrolvalues)> ;
    • The DUNNETT option of the MEANS statement performs Dunnett’s two- tailed t test, testing if any treatments are significantly different from a single control for all main effects means in the MEANS statement.
    • To specify which level of the effect is the control, enclose its quoted formatted value in parentheses after the keyword.
    • If more than one effect is specified in the MEANS statement, you can use a list of control values within the parentheses.
    • By default, the first level of the effect is used as the control .
  • MEANS effects / DUNNETTL <(formattedcontrolvalue)> ;
    • The DUNNETTL option of the MEANS statement performs Dunnett’s one-tailed t test, testing if any treatment is significantly smaller than the control.
  • MEANS effects / DUNNETTU <(formattedcontrolvalue)> ;
    • The DUNNETTU option of the MEANS statement performs Dunnett’s one-tailed t test, testing if any treatment is significantly larger than the control.
  • MEANS effects / E= effect ;
    • The E= option of the MEANS statement specifies the error mean square to use in the multiple comparisons. If the E= option is omitted, GLM uses the residual Mean Square (MS).
    • The effect specified with the E= option must be a term in the model; otherwise, the procedure uses the residual MS.
  • MEANS effects / ETYPE= n ;
    • The ETYPE= option of the MEANS statement specifies the type of mean square for the error effect. When E= effect is specified, you may need to indicate which type (1,2,3, or 4) of MS is to be used.
    • The n value must be one of the types specified or implied by the MODEL statement. The default MS type is the highest type used in the analysis.
  • MEANS effects / GABRIEL ;
    • The GABRIEL option of the MEANS statement performs Gabriel’s multiple-comparison procedure on all main effect means in the MEANS statement.
  • MEANS effects / HOVTEST <=keyword>;
    • The HOVTEST= option requests a homogeneity of variance test for the groups defined by the MEANS effect.
      • HOVTEST=BARTLETT requests Bartlett’s test. This test is not very accurate when the distribution is even slightly nonnormal, and it is not recommended for routine use.
      • HOVTEST=BF requests the Brown-Forsythe test. Simulation results indicate this test is best at providing power to detect variance differences while protecting the Type I error probability, but it can be resource-intensive.
      • HOVTEST=LEVENE <(TYPE=ABS|SQUARE)> requests Levene’s test with either absolute or squared differences. Squared differences are the default.
      • HOVTEST=OBRIEN <(W=number)> specifies O’Brien’s test, which is a modification of Levene’s test.
    • If you specify the HOVTEST option without specifying a test, Levene’s test is computed with TYPE=SQUARE. The HOVTEST= option is ignored unless the MODEL statement specifies a simple one-way model.
  • MEANS effects / HTYPE= n ;
    • The HTYPE= option of the MEANS statement gives the MS type for the hypothesis MS. The HTYPE= option is needed only when the WALLER option is specified.
    • Default: the highest type used in the model
  • MEANS effects / KRATIO= value ;
    • The KRATIO= option of the MEANS statement gives the type1/type2 error seriousness ratio for the Waller-Duncan test.
    • Reasonable values for KRATIO are 50, 100, 500, which roughly correspond for the two-level case to ALPHA levels of 0.1, 0.05, and 0.01.
    • If the KRATIO= option is omitted, the procedure uses the default value of 100 .
  • MEANS effects / LINES ;
    • The LINES option of the MEANS statement requests that the results of the BON, DUNCAN, GABRIEL,REGWF, REGWQSCHEFFE, SIDAK, GT2T, LSDTUKEY, and WALLER options be presented by listing the means in descending order and indicating nonsignificant subsets by line segments beside the corresponding means.
  • MEANS effects / NOSORT ;
    • The NOSORT option of the MEANS statement prevents the means from being sorted into descending order when CLDIFF or CLM is specified.
  • MEANS effects / REGWF ;
    • The REGWF option of the MEANS statement performs the Ryan-Einot- Gabriel-Welsch multiple F test on all main effect means in the MEANS statement.
  • MEANS effects / REGWQ ;
    • The REGWQ option of the MEANS statement performs the Ryan-Einot- Gabriel-Welsch multiple-range test on all main effect means in the MEANS statement.
  • MEANS effects / SCHEFFE ;
    • The SCHEFFE option of the MEANS statement performs Scheffe’s multiple-comparison procedure on all main effect means in the MEANS statement.
  • MEANS effects / SIDAK ;
    • The SIDAK option of the MEANS statement performs pairwise t tests on differences between means with levels adjusted according to Sidak’s inequality for all main effect means in the MEANS statement.
  • MEANS effects / SMM ; MEANS effects / GT2 ;
    • The SMM or GT2 option of the MEANS statement performs pairwise comparisons based on the studentized maximum modulus and Sidak’s uncorrelated-t inequality, yielding Hochberg’s GT2 method when sample sizes are unequal, for all main effect means in the MEANS statement.
  • MEANS effects / SNK ;
    • The SNK option of the MEANS statement performs the Student-Newman- Keuls multiple range test on all main effect means in the MEANS statement.
  • MEANS effects / T ; MEANS effects / LSD ;
    • The T or LSD option of the MEANS statement performs pairwise t tests, equivalent to Fisher’s least-significant-difference test in the case of equal cell sizes, for all main effect means in the MEANS statement.
  • MEANS effects / TUKEY ;
    • The TUKEY option of the MEANS statement performs Tukey’s studentized range test (HSD) on all main effect means in the MEANS statement.
  • MEANS effects / WALLER ;
    • The WALLER option of the MEANS statement requests that the Waller- Duncan k-ratio t test be performed on all main effect means in the MEANS statement.
  • MEANS effects / WELCH;
    • The WELCH option requests Welch’s variance-weighted one-way ANOVA. If you specify the HOVTEST= option and the test rejects the homogeneity of variance assumption, you should use Welch’s ANOVA if you want to test for differences between group means. Note that this conclusion only holds if you are not using HOVTEST=BARTLETT. Also note that homogeneity of variance tests do not always have enough power to detect when Welch’s ANOVA is appropriate.
    • This option is ignored unless the MODEL statement specifies a simple one-way model.

OUTPUT Statement

OUTPUT OUT= SAS-data-set keywords= names... ;
  • The OUTPUT statement creates a new SAS data set. All the variables in the original data set are included in the new data set, along with variables named in the OUTPUT statement.
  • The OUT= option gives the name of the new data set. If the OUT= option is omitted, the new data set is named using the DATAn convention.

The following keywords specify values to be calculated and output to the new data set:

PREDICTED|P= RESIDUAL|R= L95M= U95M=
L95= U95= STDP= STDR=
STDI= STUDENT= COOKD= H=
PRESS= RSTUDENT= DFFITS= COVRATIO=
  • The PREDICTED= option of the OUTPUT statement calculates predicted values.
  • The RESIDUAL= option of the OUTPUT statement calculates residuals, calculated as ACTUAL minus PREDICTED.
  • The L95M= option of the OUTPUT statement calculates the lower bound of a 95% confidence interval for the expected value (mean) of the dependent variable.
  • The U95M= option of the OUTPUT statement calculates the upper bound of a 95% confidence interval for the expected value (mean) of the dependent variable.
  • The STDP= option of the OUTPUT statement calculates the standard error of the mean predicted value.
  • The STDR= option of the OUTPUT statement calculates the standard error of the residual.
  • The STDI= option of the OUTPUT statement calculates the standard error of the individual predicted value.
  • The STUDENT= option of the OUTPUT statement calculates the studentized residuals, the residual divided by its standard error.
  • The COOKD= option of the OUTPUT statement calculates Cook’s D influence statistic.
  • The H= option of the OUTPUT statement calculates the leverage.
  • The PRESS= option of the OUTPUT statement calculates the residual for the i’th observation that results from dropping the i’th observation from the parameter estimates.
  • The RSTUDENT= option of the OUTPUT statement calculates a studentized residual with the current observation deleted.
  • The DFFITS= option of the OUTPUT statement calculates the standard influence of observation on predicted value.
  • The COVRATIO= option of the OUTPUT statement calculates the standard influence of observation on covariance of betas.

RANDOM Statement

RANDOM effects / options;
  • The RANDOM statement specifies which effects in the model are random.
  • You can use as many RANDOM statements as you want, provided that they appear after the MODEL statement. If you use a CONTRAST statement with a RANDOM statement, you must enter the CONTRAST statement before the RANDOM statement.
  • The list of effects in the RANDOM statement should contain one or more of the pure classification effects specified in the MODEL statement.
  • The Q and TEST options can appear in the RANDOM statement after a slash (/).

REPEATED Statement

REPEATED factorname levels (levelvalues) transformation<,...> / options ;
  • The REPEATED statement provides multivariate and univariate tests as well as hypothesis tests for a variety of single-degree-of-freedom contrasts. If you use a CONTRAST statement with a REPEATED statement, you must enter the CONTRAST statement before the REPEATED statement.
  • The factorname term names a factor to be associated with the dependent variables. The levels term gives the number of levels associated with the factor being defined. The (levelvalues) term gives values that correspond to levels of a repeated-measures factor.
  • Several different transformation keywords can be used to define single-degree-of-freedom contrasts for the factors specified. If no keyword is specified, REPEATED uses the CONTRAST transformation.

TEST Statement

TEST H= effects E= effect / options ;
  • Although an F value is computed for all SS in the analysis using the residual MS as an error term, you may request additional F tests using other effects as error terms by using a TEST statement.
  • You may use as many TEST statements as you want, provided they appear after the MODEL statement.
  • The H=effects term specifies which effects in the preceding model are to be used as hypothesis (numerator) effects. The E=effects term specifies one, and only one, effect to use as the error (denominator) term. The E= specification is required.
  • By default, the SS type for all hypothesis SS and error SS is the highest type computed in the model. If the hypothesis type or error type is to be another type that was computed in the model, you should specify one or both of these options after a slash (/):
ETYPE= n HTYPE= n
  • The ETYPE= option of the TEST statement specifies the type of SS to use for the error term. The type must be a type computed in the model (n=1,2,3, or 4).
  • The HTYPE= option of the TEST statement specifies the type of SS to use for the hypothesis. The type must be a type computed in the model (n=1,2,3, or 4).
Advertisements

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s