Can a Randome Effect Be a Continuous Variable

Concept: Random Effects Models - Continuous Data

Concept Description

Last Updated: 2009-08-25

Introduction

This discussion will focus on methods for the analysis of continuous, normally distributed data. Continuous outcome measures used by health services researchers could include measures of health-care costs (particularly if a logarithmic transformation is applied to the data), indices of continuity of care, or measures of severity of illness.

A. Longitudinal Designs

Longitudinal data arise when repeated measurements are obtained for an individual (or unit of analysis) on one or more outcome variables at successive points in time. The analyst is interested in describing the trend over time (i.e., is it linear or curvilinear; is it increasing or decreasing), as well as whether there are significant differences in the trend across groups of subjects defined by such characteristics as income quintile, sex, region of residence, or severity of illness. One advantage of using a longitudinal design is that it is possible to separate age and cohort effects. For example, health care use may increase over time because people consume more health care resources as they get older. However, there may be differences in the rates of increase in use for individuals from different birth cohorts.

B. Clustered Designs

Examples from Education:

Level 1 corresponds to individual-level data. Level 2 (and up) identify the clusters within which individuals are.

Level #: Unit of Analysis (Possible model covariates)

Level 1: Student (Sex, Parental Marital Status, Parental Education Attainment, or Number of Siblings)
Level 2: Teacher or Classroom (Classroom Size, Sex-Composition of Classroom, Teacher's Level of Education or Teacher's Years of Experience)
Level 3: School (Sex-Composition of School, Type - Public vs. Private)
Level 4: Division (Income Level of Division, Rural/Urban Status)
Dependent Variable: Standardized test score

C. Why Use Random Effects Models

Within-individual or within-cluster component: an individual's change over time or cluster-specific response is described by a regression model with a population-level intercept and slope.
Between-individual or between-cluster component: variation in individual or cluster-intercepts and slopes is captured.

D. Advantages of Random Effects Models for Longitudinal Data Analysis

There are a number of techniques for analyzing longitudinal data, including univariate and multivariate analysis of variance (ANOVA) and generalized linear models with generalized estimating equations (i.e., GEE models).

Subjects are not assumed to be measured on the same number of time points, and the time points do not need to be equally spaced;
Analyses can be conducted for subjects who may miss one or more of the measurement occasions, or who may be lost to follow-up at some point during study.

Both random effects and GEE models allow the analyst to model the correlation structure of the data. Thus, the analyst does not need to assume that measurements taken at successive points in time are equally correlated, which is the correlation structure that underlies the ANOVA model. The analyst also does not need to assume measurements taken at successive points in time have an unstructured pattern of correlations, which is the structure that underlies the multivariate analysis of variance model. The former pattern is generally too restrictive, while the latter is too generic. With both random effects and GEE models, the analyst can fit a specific correlation structure to the data, such as an autoregressive structure, which assumes a decreasing correlation between successive measurements over time. This can result in a more efficient analysis, with improved power to detect significant changes over time.

With administrative health data, varying numbers of measurement occasions and missing observations are typically not of great concern. Very few individuals are lost to follow-up in population-based studies. Loss to follow-up will occur when individuals leave the province, or when they die. Moreover, time points of measurement will typically be equally spaced because time is often defined in terms of fiscal or calendar years, months or weeks. However, analyses of administrative data frequently include both time-varying and time-invariant covariates.

E. Statistical Model for Longitudinal Data

The simplest regression model for longitudinal data is one in which measurements are obtained for a single dependent variable at successive time points. Let Y_it represent the measurement for the i -th individual at the t -th point in time,

concept/eqtn01.gif

₀

₁

_it

₀

₁

The simplest random effects model is one where the intercept is allowed to vary across individuals:

concept/eqtn02.gif

concept/eqtn03.gif

concept/eqtn0405.gif

₁

When both the slope and the intercept are allowed to vary across individual the model is:

concept/eqtn07.gif

concept/eqtn03.gif

concept/eqtn0809.gif

is the variance-covariance matrix of random effects. Correlation exists between the random slope and the random intercept, so that individuals who have higher values for the intercept (i.e., higher or lower values on the dependent variable at the baseline time point) will also have higher or lower values for the slope.

F. Steps in Conducting a Random Effects Analysis

Step 1: Exploratory Data Analysis

Before deciding whether a random effects model is an appropriate choice for the data, the analyst should begin by conducting a thorough exploratory analysis of the data. Exploratory data analysis (EDA) techniques are used to examine:

Correlations among measurements - this is useful for selecting a covariance structure for the data. The analyst might ask the following questions: Is there equal correlation between successive measurements? Does the correlation appear to decrease over time?
Nature of trend over time - is it linear or non-linear (i.e., curvilinear) in form? If the latter, the analyst may need to include a high-order time effect in the model, such as time² .
Heterogeneity - is variability in the measurements increasing or decreasing over time? Increasing variability suggests that the analyst will need to consider including a random slope in the model.
Presence of outliers - are extreme observations or influential observations present on either a cross-sectional or longitudinal basis? If the data are non-normal, then the analyst may want to consider adopting a non-linear random effects model. For example, for non-normal data, the analyst might need to consider a binomial, negative binomial, Poisson, or gamma distribution to fit the data.

PROC GPLOT - to produce plots of the trends over time for individual subjects, or for groups of subjects defined by time-invariant covariates such as gender.
PROC CORR - to characterize the correlation between measurements.
PROC UNIVARIATE - to examine means, variances, skewness, kurtosis, and to check for extreme values at each time point.

Step 2: Fitting the Model

For continuous, normal data, SAS PROC MIXED can be used to do one or more of the following:

Fit the fixed effects
Select a correlation structure for the measurements
Fit the random effects
Select a correlation structure for the random effects

Step 3: Checking the Fit of the Model

The analyst will wish to determine whether the initial model that is fit to the data is an appropriate choice. Often the analyst will cycle between steps 2 and 3 to select the best model given the characteristics of the data, and the research questions of interest. Model goodness of fit statistics can be used to compare models and determine:

which correlation structure should be fit to the data;
whether random intercepts and/or random slopes are necessary in the model;
whether all of the predictor variables and one or more interaction terms should be included in the model.

Step 4: Testing Hypotheses on the Data

Once the analyst has chosen a good model for the data, one or more focused hypotheses may be tested on the data. This can be accomplished via CONTRAST and ESTIMATE statements in PROC MIXED.

G. An Important Note: Coding Time in the Model

Code t the time variable, so that the baseline measure has a value of zero and successive measurements are incremented accordingly. Using this format, the intercept represents the mean value of the dependent variable at the baseline time.
Code t by centering the time values. For example if t = 6, 12, 18, 24, 30, then the centred values would be -.33, -.667, 0, .667, .33. Using this format, the intercept represents the dependent variable measurement at the midpoint of time.
Code t so that the endpoint measure has a value of zero and preceding measurements are decremented accordingly. Using this format, the intercept represents the mean value of the dependent variable at the endpoint.

The choice of which coding scheme to adopt is determined by the analyst and the researcher based on the hypotheses of interest and the interpretation of the intercept (and its variance) that is of interest to the researcher.

H. Selecting a Correlation Structure

Exchangeable or compound symmetric - assumes that correlation between all pairs of measurements are equal irrespective of the length of the time interval.

Exchangeable	t₁	t₂	t₃	t₄
t₁	1	p	p	p
t₂	.	1	p	p
t₃	.	.	1	p
t₄	.	.	.	1

Autoregressive (first order) - with this structure, the correlations decrease over time. Observations that are one measurement occasion apart are assumed to have a correlation equal to p , observations two measurements apart are assumed to have a correlation equal to p ² , and so on. In general, observations t measurements apart are assumed to have a correlation equal to p ^t .

Autoregressive	t₁	t₂	t₃	t₄
t₁	1	p	p²	p³
t₂	.	1	p	p²
t₃	.	.	1	p
t₄	.	.	.	1

Unstructured - with this structure, all correlations are assumed to be different.

Unstructured	t₁	t₂	t₃	t₄
t₁	1	p₁	p₂	p₃
t₂	.	1	p₄	p₅
t₃	.	.	1	p₆
t₄	.	.	.	1

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/mixed_sect19.htm#stat_mixed_mixedecovstruct

I. Structure for Longitudinal Data

As defined previously, let Y_it represent the dependent variable value for the i th individual at the t th point in time and let X_it be the vector of predictor variable values for the i th individual at the t th time point. That is, X_it = [X_it1 X_it2 … X_itK ]. The ID variable is a unique identifier for each individual in the data set. The univariate data structure is:

ID	Y_it	X_it1	X_it2	…	X_itK
1	Y ₁₁	X ₁₁₁	X ₁₁₂	…	X _{11 K}
1	Y ₁₂	X ₁₂₁	X ₁₂₂	…	…
…	…	…	…	…	…
1	Y _{1 T}	X _{1 T 1}	X _{1 T 2}	…	X _{1 TK}
2	Y ₂₁	X ₁₂₁	X ₁₂₂	…	X _{12 K}
…	…	…	…	…	…
N	Y_NT	X _NT1	X _NT2	…	X _1NTK

J. SAS CODE

For a model which contains only fixed effects, that is,

concept/eqtn01.gif

PROC MIXED DATA=data-set-name METHOD=method-of-estimation covtest;
CLASS id;
MODEL dependent-variable = time-variable / solution;
REPEATED / TYPE=correlation-structure SUBJECT=id r rcorr;
RUN;

Compound symmetric: TYPE=CS
First-Order Autoregressive: TYPE=AR(1)
Unstructured: TYPE=UN

COVTEST option
- Produces asymptotic standard errors and Z-tests for each of the covariance parameter estimates
method of Estimation - the two most common methods are
- METHOD=REML (Restricted Maximum Likelihood - Default)
- METHOD=ML (Maximum Likelihood)
MODEL statement
- all fixed effects are listed after equality
SOLUTION option
- Requests the printing of the parameter estimates for all fixed effects in the model, together with standard errors, t statistics, and p values
REPEATED Statement
- Used to specify that the data for each id are from the same subject, and that the specified correlation structure should be fit to the repeated measurements. Note that the id variable must also be listed in the CLASS statement.
R,RCORR options - produces the variance-covariance and correlation matrices for the repeated measurements

For the model that contains a random intercept,

concept/eqtn13.gif

PROC MIXED DATA=data-set-name METHOD=method-of-estimation covtest;

Notes about this syntax:

G, GCORR options
- Produces the variance-covariance matrix and correlation matrix for the random effects
RANDOM statement
- Identifies which parameters in the model are allowed to vary across subjects
- SUBJECT=id means that all records with the same value of id are assumed to be from the same subject, whereas records with different values of id are assumed to come from independent subjects. The RANDOM statement with this option produces a block-diagonal structure in G, with identical blocks.

concept/eqtn07.gif

PROC MIXED DATA=data-set-name METHOD=method-of-estimation covtest;

See the document Numeric Example of Random Effects Models for Longitudinal Data - Continuous Data for a numeric example.

K. Reducing Computing Time for PROC MIXED

Computing time can be long with many clusters or subjects.
Possible solutions:

Set initial values for variance-covariance estimates.
Use explicit nesting for hierarchical data with three or more levels (when appropriate).
Use the DDFM=BW option.

Also see the SAS online documentation: MIXED --> Details --> Computational Issues --> Computing Time
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/mixed_sect46.htm

1. Finding and Setting Initial Values

Take a random sub-sample using PROC SURVEYSELECT. There are various methods of selecting a random sample (stratified-sampling, cluster-sampling, simple random sampling, etc.), but for the purpose of setting initial values, the type may not be important.

See the SAS online documentation for further details of the various methods.
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/surveyselect_sect7.htm

Example of SAS code for simple random sampling (SRS) without replacement:
PROC SURVEYSELECT DATA=indata OUT=outdata
NOPRINT METHOD=SRS RATE=## SEED=##;
RUN;

Run PROC MIXED using the random sample and look at the variance-covariance output.

Run PROC MIXED using the full dataset with the PARMS line SAS code to set initial values.

There are two methods: (i) manually enter the variance-covariance estimates, or (ii) identify the variance-covariance output SAS dataset from the random sub-sample PROC MIXED output.
(i) PARMS (#) (#) (#);
(ii) PARMS / PARMSDATA=var_cov;

2. Using Explicit Nesting

For data with multiple clustering structures, sometimes clusters are nested within another cluster.

Nested Example: Students --> Class --> School

Non-Nested Example:

Clustering 1 - Students in the same class

Clustering 2 - Kids in the same neighborhood

SAS code for explicit nesting where l2_cluster denotes 2nd level clustering and l3_cluster denotes 3rd level clustering:
RANDOM INT / SUBJECT = l3_cluster;
RANDOM INT / SUBJECT = l2_cluster (l3_cluster);

3. Using the DDFM=BW option

This makes SAS use a different method to compute the denominator degrees of freedom for fixed effects.

Fixed effects parameter estimates and variance-covariance estimates (along with their standard errors) are virtually the same.

Degrees of freedom are much higher, however.

See the SAS online documentation for further details: MIXED --> Syntax --> MODEL --> DDFM=
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/mixed_sect15.htm#stat_mixed_mixedddfm

SAS code:
MODEL outcome = ... / DDFM=BW;

4. SAS Code for all suggestions together (Random Intercept Model):

PROC MIXED DATA=indata;
CLASS l2_cluster l3_cluster;
MODEL outcome = v1 v2 v3 / DDFM=BW;
RANDOM INT / SUBJECT = l3_cluster;
RANDOM INT / SUBJECT = l2_cluster(l3_cluster);
PARMS (##) (##) (##);
QUIT;

Related concepts

Generalized Estimating Equations (GEE)
Statistics for Large Databases

Related terms

Analysis of Variance (ANOVA)
Modelling
Random Effects Models

References

Brownell M, Lix L, Ekuma O, Derksen S, Dehaney S, Bond R, Fransoo R, MacWilliam L, Bodnarchuk J. Why is the Health Status of Some Manitobans Not Improving? The Widening Gap in the Health Status of Manitobans. Winnipeg, MB: Manitoba Centre for Health Policy, 2003. [Report] [Summary] (View)
Diggle PJ, Liang KY, Zeger SL. The Analysis of Longitudinal Data. Oxford, United Kingdom: Univeristy Press; 1994.(View)
Menec V, Lix L, Steinbach C, Ekuma O, Sirski M, Dahl M, Soodeen R. Patterns of Health Care Use and Cost at the End of Life. Winnipeg, MB: Manitoba Centre for Health Policy, 2004. [Report] [Summary] (View)
Omar RZ, Wright EM, Turner RM, Thompson SG. Analysing repeated measurements data: a practical comparison of methods. Statistics in Medicine 1999;18(13):1587-1603. [Abstract] (View)
Roos LL, Nicol JP, Cageorge SM. Using administrative data for longitudinal research: comparisons with primary data collection. J Chronic Dis 1987;40(1):41-49. [Abstract] (View)
Singer JD, Willett JB. Applied Longitudinal Data Analysis: Modeling Change and Event Occurance. New York, NY: Oxford University Press; 2003.(View)
Singer JD. Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavorial Statistics 1998;24(4):323-355.(View)
Twisk JWR. Applied Longitudinal Data Analysis for Epidemiology: A Practical Guide. Cambridge, UK: Cambridge University Press; 2003.(View)
Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York, NY: Springer-Verlag; 2000.(View)
Wu YW, Clopper RR, Woolridge PJ. A comparison of traditional approaches to hierarchical linear modeling when analyzing longitudinal data. Research in Nursing and Health 1999;22(5):421-432. [Abstract] (View)

Keywords

statistics

davidsuplined.blogspot.com

Source: http://mchp-appserv.cpe.umanitoba.ca/viewConcept.php?printer=Y&conceptID=1058

Can a Randome Effect Be a Continuous Variable

Concept: Random Effects Models - Continuous Data

Concept Description

Introduction

A. Longitudinal Designs

B. Clustered Designs

C. Why Use Random Effects Models

D. Advantages of Random Effects Models for Longitudinal Data Analysis

E. Statistical Model for Longitudinal Data

F. Steps in Conducting a Random Effects Analysis

G. An Important Note: Coding Time in the Model

H. Selecting a Correlation Structure

I. Structure for Longitudinal Data

J. SAS CODE

K. Reducing Computing Time for PROC MIXED

1. Finding and Setting Initial Values

2. Using Explicit Nesting

3. Using the DDFM=BW option

4. SAS Code for all suggestions together (Random Intercept Model):

Related concepts

Related terms

References

Keywords

0 Response to "Can a Randome Effect Be a Continuous Variable"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel