Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model
Introduction
HIV viral load is a widespread marker of the evolution of HIV-infected patients (Perelson et al., 1997); the reduction in HIV viral load is frequently used as the primary endpoint in clinical trials to evaluate the efficacy of anti-viral treatments (see for example Wu et al., 1998, Ding and Wu, 1999, Ding and Wu, 2000, Ding and Wu, 2001, Jacqmin-Gadda et al., 2000, Wu and Wu, 2002, Wu and Zhang, 2002). Nonlinear mixed-effects models (NLMEM) can be used in these longitudinal studies to exploit the richness of the dynamics seized by repeated measurements and to account for inter- and intra-patient variability in viral load measurements. In addition, understanding the mechanism of the large inter-patient variability may help in making appropriate clinical decisions and providing individualized treatment. Unfortunately, all available assays of viral load measurements have a low limit of quantification (LOQ), generally between 20 and 400 copies/ml. Besides, the proportion of subjects with a viral load below LOQ has increased with the introduction of highly active antiretroviral treatments. Working with such left-censored data complicates the study of longitudinal viral load data. This issue is common in other longitudinal studies with LOQ, such as pharmacokinetics or pharmacodynamics, which also widely use NLMEM.
This paper aims to develop a reliable inference based on maximum likelihood (ML) theory for HIV dynamics models with left-censored viral load and NLMEM. It is indeed important to obtain reliable estimates of the viral dynamic parameters, that can be used to evaluate antiviral therapies through comparison of treatment groups.
To address the estimation problem in longitudinal data analysis containing censored values, naive procedures such as omitting the censored data or imputing a fixed value (e.g., the quantification limit or half the limit) are combined with usual estimation methods of mixed models (see Beal, 2001, for a comparison of classical procedures in NLMEM). However, the statistical properties of such procedures are unclear. More inventive approaches propose multiple imputations of the censored values, by substituting a reasonable guess for each missing value. For example, in linear mixed models, Hughes (1999) proposes a Monte-Carlo version of the expectation maximization (EM) algorithm (Dempster et al., 1977), taking into account the censored values as missing data. Hughes (1999) shows that his approach significantly reduces the bias associated with naive imputation procedures. Jacqmin-Gadda et al. (2000) propose a direct maximization of the likelihood using an iterative process for linear mixed models as well, including an autoregressive error model. They combine two optimization algorithms, the Simplex and the Marquardt algorithms.
For nonlinear mixed models, the problem is more complex, the estimation of such model parameters being difficult even without censored observations. Indeed, because of the nonlinearity of the regression function in the random effects, the likelihood of NLMEM cannot be expressed in a closed form. Consequently, several authors propose some widely used likelihood approximation methods, such as linearization algorithms, which are implemented in the NONMEM software and in the nlme function of Splus and R software (Beal and Sheiner, 1982, Lindstrom and Bates, 1990), or Laplacian or Gaussian quadrature algorithms, which are implemented in the NLMIXED macro of SAS (Wolfinger, 1993). Wu and Wu (2001) propose a multiple imputation method for missing covariates in NLMEM based on a linearization algorithm. However, none of these algorithms based on likelihood approximation can be considered as fully established theoretically. A different point of view can be taken, the individual parameters and the censored values being considered as non-observed data. The EM algorithm is then the most adapted tool to estimate incomplete data models. Because of the nonlinearity of the model, stochastic versions of the EM algorithm are proposed. Wu, 2002, Wu, 2004 introduces MCEM algorithms, with a Monte-Carlo approximation of the expectation step, adapted to both NLMEM and the censoring problem of observations and covariates. This Monte-Carlo implementation is based on samples independently and identically distributed from the conditional density, requiring Markov chain Monte-Carlo (MCMC) procedures. The replication choice of the Monte-Carlo sample is a central issue to guarantee convergence and this remains an open problem. Wu (2004) proposes an “exact” MCEM but emphasizes that this MCEM algorithm is very slow to converge. Indeed simulations of these large samples at each iteration are time consuming. To address this computational problem, Wu, 2002, Wu, 2004 also proposes an approximate MCEM using a linearization of the model leading to an approximate ML method.
As an alternative to address both the point-wise convergence and the computational problem, stochastic approximation versions of EM (SAEM) are proposed for NLMEM with no censored values (Delyon et al., 1999, Kuhn and Lavielle, 2005b). This algorithm requires a simulation of only one realization of the missing data at each iteration, avoiding the computational difficulty of independent sample simulation occurring in the MCEM and shortening the time to simulate. In addition, point-wise almost sure convergence of the estimate sequence to a local maximum of the likelihood is proved by Delyon et al. (1999) under conditions satisfied by models from the exponential family. Girard and Mentré (2005) propose a comparison of these estimation methods in NLMEM using a blind analysis, showing the accuracy of the SAEM algorithm in comparison with other methods. Especially, the computational convergence of the SAEM algorithm is clearly faster than those of the MCEM algorithm. However, this current SAEM algorithm is only appropriate for NLMEM without censored values.
The first objective of the present paper is thus to extend the SAEM algorithm to handle left-censored data in NLMEM as an exact ML estimation method. We include in the extended SAEM algorithm the simulation of the left-censored data with a right-truncated Gaussian distribution. We prove the convergence of this extended SAEM algorithm under general conditions. The second objective of this paper is to illustrate this algorithm with a simulation study in the HIV dynamics context. Furthermore, we compare the extended SAEM algorithm with more classical approaches to handle left-censored data such as omission or imputation of the censored data, on the same simulation study.
After describing the model and the notations (Section 2), Section 3 describes the extended SAEM algorithm. Section 4 reports the simulation study and its results. We simulate data sets using the bi-exponential model for HIV dynamics proposed by Ding and Wu (2001), and evaluate the statistical properties of the extended SAEM parameter estimates and the classical approaches. Particularly, we evaluate two comparison group tests, the Wald test and the likelihood ratio test, provided by the SAEM algorithm. We then apply the extended SAEM algorithm to the TRIANON-ANRS 81 clinical trial of HIV treatment in Section 5. The aim of this new analysis of the TRIANON data is to show the ability of NLMEM to describe the evolution of the viral load and to test a treatment's effect between the two treatment groups, in the presence of left-censored observations. Section 6 concludes the article with some discussion.
Section snippets
Models and notations
Let us define where is the response value for individual i at time , , , and let . Let us define an NLMEM as follows:where and/or are nonlinear functions of , represents the residual error, is a p-vector of individual parameters, is the -matrix of fixed effects, is the k-vector of known covariates, is a p-vector of random effects independent of ,
The SAEM algorithm
The EM algorithm introduced by Dempster et al. (1977) is a classical approach to estimate parameters of models with non-observed or incomplete data. Let us briefly cover the EM principle. Let be the vector of non-observed data. The complete data of the model is . The EM algorithm maximizes the function in two steps, where is the log-likelihood of the complete data. At the mth iteration, the E step is the evaluation of , whereas the M step
Simulation settings
The first objective of this simulation study is to illustrate the main statistical properties of the extended SAEM algorithm in the context of HIV viral dynamics (bias, root mean square errors (RMSEs), group comparison tests). The second objective is to compare the extended SAEM algorithm to some of the classical approaches proposed to take into account a censoring process.
We use the bi-exponential model for initial HIV dynamics proposed by Ding and Wu (2001) to simulate the data sets
Application to the Trianon (ANRS81) trial
We illustrate the extended SAEM algorithm on viral load data from the clinical trial TRIANON supported by the French Agence National de Recherche sur le Sida (ANRS). In this study, 144 patients infected with HIV-1, who were randomized into two treatment groups, undergo treatment for 72 weeks: 71 patients receive treatment A (lamivudine, d4T and indinavir) and 73 patients treatment B (nevirapine, d4T and indinavir). Viral load is measured at weeks 4 and 8 and every 8 weeks thereafter up through
Discussion
To analyze longitudinal data with left-censored responses, we propose a ML estimation method that may be preferred over methods classically used with NLMEM. We extend the SAEM algorithm developed by Kuhn and Lavielle (2005b) and the monolix 1.1 Matlab function (http://mahery.math.u-psud.fr/~lavielle/monolix) by including in the simulation step of the SAEM algorithm a simulation of the left-censored data with the right-truncated Gaussian distribution using an accept-reject algorithm proposed by
Acknowledgments
The authors thank the scientific committee of the TRIANON-ANRS81 trial for giving us access to their patients’ viral load measurements and especially Dr. O. Launay, main investigator of TRIANON (Hospital Cochin University, Paris, France) and Dr. J.P. Aboulker, methodological coordinator (INSERM SC10, Villejuif, France).
References (28)
- et al.
Relationships between antiviral treatment effects and biphasic viral decay rates in modeling HIV dynamics
Math. Biosci.
(1999) - et al.
Maximum likelihood estimation in nonlinear mixed effects models
Comput. Statist. Data Anal.
(2005) Ways to fit a PK model with some data below the quantification limit
J. Pharmacokinet. Pharmacodyn.
(2001)- et al.
Estimating population kinetics
Crit. Rev. Biomed. Eng.
(1982) - et al.
Convergence of a stochastic approximation version of the EM algorithm
Ann. Statist.
(1999) - et al.
Maximum likelihood from incomplete data via the EM algorithm
J. Roy. Statist. Soc. Ser. B
(1977) - et al.
A comparison study of models and fitting procedures for biphasic viral dynamics in HIV-1 infected patients treated with antiviral therapies
Biometrics
(2000) - et al.
Assessing antiviral potency of anti-HIV therapies in vivo by comparing viral decay rates in viral dynamic models
Biostatistics
(2001) - et al.
Impact of omission or replacement of data below the limit of quantification on parameter estimates in a two-compartment model
Pharm. Res.
(2002) - et al.
Sampling-based approaches to calculating marginal densities
J. Amer. Statist. Assoc.
(1990)