Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model

https://doi.org/10.1016/j.csda.2006.05.007Get rights and content

Abstract

The reduction of viral load is frequently used as a primary endpoint in HIV clinical trials. Nonlinear mixed-effects models are thus proposed to model this decrease of the viral load after initiation of treatment and to evaluate the intra- and inter-patient variability. However, left censoring due to quantification limits in the viral load measurement is an additional challenge in the analysis of longitudinal HIV data. An extension of the stochastic approximation expectation-maximization (SAEM) algorithm is proposed to estimate parameters of these models. This algorithm includes the simulation of the left-censored data in a right-truncated Gaussian distribution. Simulation results show that the proposed estimates are less biased than the usual naive methods of handling such data: omission of all censored data points, or imputation of half the quantification limit to the first point below the limit and omission of the following points. The viral load measurements obtained in the TRIANON-ANRS81 clinical trial are analyzed with this method and a significant difference is found between the two treatment groups of this trial.

Introduction

HIV viral load is a widespread marker of the evolution of HIV-infected patients (Perelson et al., 1997); the reduction in HIV viral load is frequently used as the primary endpoint in clinical trials to evaluate the efficacy of anti-viral treatments (see for example Wu et al., 1998, Ding and Wu, 1999, Ding and Wu, 2000, Ding and Wu, 2001, Jacqmin-Gadda et al., 2000, Wu and Wu, 2002, Wu and Zhang, 2002). Nonlinear mixed-effects models (NLMEM) can be used in these longitudinal studies to exploit the richness of the dynamics seized by repeated measurements and to account for inter- and intra-patient variability in viral load measurements. In addition, understanding the mechanism of the large inter-patient variability may help in making appropriate clinical decisions and providing individualized treatment. Unfortunately, all available assays of viral load measurements have a low limit of quantification (LOQ), generally between 20 and 400 copies/ml. Besides, the proportion of subjects with a viral load below LOQ has increased with the introduction of highly active antiretroviral treatments. Working with such left-censored data complicates the study of longitudinal viral load data. This issue is common in other longitudinal studies with LOQ, such as pharmacokinetics or pharmacodynamics, which also widely use NLMEM.

This paper aims to develop a reliable inference based on maximum likelihood (ML) theory for HIV dynamics models with left-censored viral load and NLMEM. It is indeed important to obtain reliable estimates of the viral dynamic parameters, that can be used to evaluate antiviral therapies through comparison of treatment groups.

To address the estimation problem in longitudinal data analysis containing censored values, naive procedures such as omitting the censored data or imputing a fixed value (e.g., the quantification limit or half the limit) are combined with usual estimation methods of mixed models (see Beal, 2001, for a comparison of classical procedures in NLMEM). However, the statistical properties of such procedures are unclear. More inventive approaches propose multiple imputations of the censored values, by substituting a reasonable guess for each missing value. For example, in linear mixed models, Hughes (1999) proposes a Monte-Carlo version of the expectation maximization (EM) algorithm (Dempster et al., 1977), taking into account the censored values as missing data. Hughes (1999) shows that his approach significantly reduces the bias associated with naive imputation procedures. Jacqmin-Gadda et al. (2000) propose a direct maximization of the likelihood using an iterative process for linear mixed models as well, including an autoregressive error model. They combine two optimization algorithms, the Simplex and the Marquardt algorithms.

For nonlinear mixed models, the problem is more complex, the estimation of such model parameters being difficult even without censored observations. Indeed, because of the nonlinearity of the regression function in the random effects, the likelihood of NLMEM cannot be expressed in a closed form. Consequently, several authors propose some widely used likelihood approximation methods, such as linearization algorithms, which are implemented in the NONMEM software and in the nlme function of Splus and R software (Beal and Sheiner, 1982, Lindstrom and Bates, 1990), or Laplacian or Gaussian quadrature algorithms, which are implemented in the NLMIXED macro of SAS (Wolfinger, 1993). Wu and Wu (2001) propose a multiple imputation method for missing covariates in NLMEM based on a linearization algorithm. However, none of these algorithms based on likelihood approximation can be considered as fully established theoretically. A different point of view can be taken, the individual parameters and the censored values being considered as non-observed data. The EM algorithm is then the most adapted tool to estimate incomplete data models. Because of the nonlinearity of the model, stochastic versions of the EM algorithm are proposed. Wu, 2002, Wu, 2004 introduces MCEM algorithms, with a Monte-Carlo approximation of the expectation step, adapted to both NLMEM and the censoring problem of observations and covariates. This Monte-Carlo implementation is based on samples independently and identically distributed from the conditional density, requiring Markov chain Monte-Carlo (MCMC) procedures. The replication choice of the Monte-Carlo sample is a central issue to guarantee convergence and this remains an open problem. Wu (2004) proposes an “exact” MCEM but emphasizes that this MCEM algorithm is very slow to converge. Indeed simulations of these large samples at each iteration are time consuming. To address this computational problem, Wu, 2002, Wu, 2004 also proposes an approximate MCEM using a linearization of the model leading to an approximate ML method.

As an alternative to address both the point-wise convergence and the computational problem, stochastic approximation versions of EM (SAEM) are proposed for NLMEM with no censored values (Delyon et al., 1999, Kuhn and Lavielle, 2005b). This algorithm requires a simulation of only one realization of the missing data at each iteration, avoiding the computational difficulty of independent sample simulation occurring in the MCEM and shortening the time to simulate. In addition, point-wise almost sure convergence of the estimate sequence to a local maximum of the likelihood is proved by Delyon et al. (1999) under conditions satisfied by models from the exponential family. Girard and Mentré (2005) propose a comparison of these estimation methods in NLMEM using a blind analysis, showing the accuracy of the SAEM algorithm in comparison with other methods. Especially, the computational convergence of the SAEM algorithm is clearly faster than those of the MCEM algorithm. However, this current SAEM algorithm is only appropriate for NLMEM without censored values.

The first objective of the present paper is thus to extend the SAEM algorithm to handle left-censored data in NLMEM as an exact ML estimation method. We include in the extended SAEM algorithm the simulation of the left-censored data with a right-truncated Gaussian distribution. We prove the convergence of this extended SAEM algorithm under general conditions. The second objective of this paper is to illustrate this algorithm with a simulation study in the HIV dynamics context. Furthermore, we compare the extended SAEM algorithm with more classical approaches to handle left-censored data such as omission or imputation of the censored data, on the same simulation study.

After describing the model and the notations (Section 2), Section 3 describes the extended SAEM algorithm. Section 4 reports the simulation study and its results. We simulate data sets using the bi-exponential model for HIV dynamics proposed by Ding and Wu (2001), and evaluate the statistical properties of the extended SAEM parameter estimates and the classical approaches. Particularly, we evaluate two comparison group tests, the Wald test and the likelihood ratio test, provided by the SAEM algorithm. We then apply the extended SAEM algorithm to the TRIANON-ANRS 81 clinical trial of HIV treatment in Section 5. The aim of this new analysis of the TRIANON data is to show the ability of NLMEM to describe the evolution of the viral load and to test a treatment's effect between the two treatment groups, in the presence of left-censored observations. Section 6 concludes the article with some discussion.

Section snippets

Models and notations

Let us define yi=yi1,,yinit where yij is the response value for individual i at time tij, i=1,,N, j=1,,ni, and let y=y1,,yN. Let us define an NLMEM as follows:yij=fφi,tij+gφi,tijεij,εiN0,σ2Ini,andφi=Xiμ+biwithbiN(0,Ω),where f(·) and/or g(·) are nonlinear functions of φi, εi=εi1,,εinit represents the residual error, φi is a p-vector of individual parameters, μ is the k×p-matrix of fixed effects, Xi is the k-vector of known covariates, bi is a p-vector of random effects independent of εi, σ

The SAEM algorithm

The EM algorithm introduced by Dempster et al. (1977) is a classical approach to estimate parameters of models with non-observed or incomplete data. Let us briefly cover the EM principle. Let z be the vector of non-observed data. The complete data of the model is (y,z). The EM algorithm maximizes the Qθ|θ=ELc(y,z;θ)|y;θ function in two steps, where Lc(y,z;θ) is the log-likelihood of the complete data. At the mth iteration, the E step is the evaluation of Qm(θ)=Qθ|θ^m-1, whereas the M step

Simulation settings

The first objective of this simulation study is to illustrate the main statistical properties of the extended SAEM algorithm in the context of HIV viral dynamics (bias, root mean square errors (RMSEs), group comparison tests). The second objective is to compare the extended SAEM algorithm to some of the classical approaches proposed to take into account a censoring process.

We use the bi-exponential model for initial HIV dynamics proposed by Ding and Wu (2001) to simulate the data sets fφi,tij=

Application to the Trianon (ANRS81) trial

We illustrate the extended SAEM algorithm on viral load data from the clinical trial TRIANON supported by the French Agence National de Recherche sur le Sida (ANRS). In this study, 144 patients infected with HIV-1, who were randomized into two treatment groups, undergo treatment for 72 weeks: 71 patients receive treatment A (lamivudine, d4T and indinavir) and 73 patients treatment B (nevirapine, d4T and indinavir). Viral load is measured at weeks 4 and 8 and every 8 weeks thereafter up through

Discussion

To analyze longitudinal data with left-censored responses, we propose a ML estimation method that may be preferred over methods classically used with NLMEM. We extend the SAEM algorithm developed by Kuhn and Lavielle (2005b) and the monolix 1.1 Matlab function (http://mahery.math.u-psud.fr/~lavielle/monolix) by including in the simulation step of the SAEM algorithm a simulation of the left-censored data with the right-truncated Gaussian distribution using an accept-reject algorithm proposed by

Acknowledgments

The authors thank the scientific committee of the TRIANON-ANRS81 trial for giving us access to their patients’ viral load measurements and especially Dr. O. Launay, main investigator of TRIANON (Hospital Cochin University, Paris, France) and Dr. J.P. Aboulker, methodological coordinator (INSERM SC10, Villejuif, France).

References (28)

  • A. Ding et al.

    Relationships between antiviral treatment effects and biphasic viral decay rates in modeling HIV dynamics

    Math. Biosci.

    (1999)
  • E. Kuhn et al.

    Maximum likelihood estimation in nonlinear mixed effects models

    Comput. Statist. Data Anal.

    (2005)
  • S. Beal

    Ways to fit a PK model with some data below the quantification limit

    J. Pharmacokinet. Pharmacodyn.

    (2001)
  • S. Beal et al.

    Estimating population kinetics

    Crit. Rev. Biomed. Eng.

    (1982)
  • B. Delyon et al.

    Convergence of a stochastic approximation version of the EM algorithm

    Ann. Statist.

    (1999)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. Roy. Statist. Soc. Ser. B

    (1977)
  • A. Ding et al.

    A comparison study of models and fitting procedures for biphasic viral dynamics in HIV-1 infected patients treated with antiviral therapies

    Biometrics

    (2000)
  • A. Ding et al.

    Assessing antiviral potency of anti-HIV therapies in vivo by comparing viral decay rates in viral dynamic models

    Biostatistics

    (2001)
  • V. Duval et al.

    Impact of omission or replacement of data below the limit of quantification on parameter estimates in a two-compartment model

    Pharm. Res.

    (2002)
  • A.E. Gelfand et al.

    Sampling-based approaches to calculating marginal densities

    J. Amer. Statist. Assoc.

    (1990)
  • Girard, P., Mentré, F., 2005. A comparison of estimation methods in nonlinear mixed effects models using a blind...
  • J. Hughes

    Mixed effects models with censored data with applications to HIV RNA levels

    Biometrics

    (1999)
  • H. Jacqmin-Gadda et al.

    Analysis of left-censored longitudinal data with application to viral load in HIV infection

    Biostatistics

    (2000)
  • E. Kuhn et al.

    Coupling a stochastic approximation version of EM with a MCMC procedure

    ESAIM: P & S

    (2005)
  • Cited by (0)

    View full text