A Simple Method for Computing the Observed Information Matrix When Using the EM Algorithm with Categorical Data |
| |
Authors: | Stuart G. Baker |
| |
Affiliation: | Mathematical Statistician, Screening Section, Biometry Branch, Division of Cancer Prevention and Control , National Cancer Institute , EPN 344, 9000 Rockville Pike, Bethesda , MD , 20892 , USA |
| |
Abstract: | Abstract A simple matrix formula is given for the observed information matrix when the EM algorithm is applied to categorical data with missing values. The formula requires only the design matrices, a matrix linking the complete and incomplete data, and a few simple derivatives. It can be easily programmed using a computer language with operators for matrix multiplication, element-by-element multiplication and division, matrix concatenation, and creation of diagonal and block diagonal arrays. The formula is applicable whenever the incomplete data can be expressed as a linear function of the complete data, such as when the observed counts represent the sum of latent classes, a supplemental margin, or the number censored. In addition, the formula applies to a wide variety of models for categorical data, including those with linear, logistic, and log-linear components. Examples include a linear model for genetics, a log-linear model for two variables and nonignorable nonresponse, the product of a log-linear model for two variables and a logit model for nonignorable nonresponse, a latent class model for the results of two diagnostic tests, and a product of linear models under double sampling. |
| |
Keywords: | Incomplete data Missing data Poisson distribution |
|
|