Performing likelihood ratio tests and computing information criteria for a given model requires computation of the log-likelihood
where is the vector of population parameter estimates for the model being considered. The log-likelihood cannot be computed in closed form for nonlinear mixed effects models. It can however be estimated in a general framework for all kinds of data and models using the importance sampling Monte Carlo method. This method has the advantage of providing an unbiased estimate of the log-likelihood – even for nonlinear models – whose variance can be controlled by the Monte Carlo size.
The estimation of the log-likelihood is performed. Two different algorithms are proposed to estimate the log-likelihood: by linearization and by Importance sampling. The estimated log-likelihoods are appended to pop_parameters.txt.
Log-likelihood by importance sampling
The observed log-likelihood can be estimated without requiring approximation of the model, using a Monte Carlo approach. Since
we can estimate for each individual and derive an estimate of the log-likelihood as the sum of these individual log-likelihoods. We will now explain how to estimate for any individual i. Using the -representation of the model (the individual parameters are transformed to be Gaussian), notice first that can be decomposed as follows:
Thus, is expressed as a mean. It can therefore be approximated by an empirical mean using a Monte Carlo procedure:
- Draw M independent values , , …, from the marginal distribution .
- Estimate with
By construction, this estimator is unbiased, and consistent since its variance decreases as 1/M:
We could consider ourselves satisfied with this estimator since we “only” have to select M large enough to get an estimator with a small variance. Nevertheless, it is possible to improve the statistical properties of this estimator.
The problem is that it is not possible to generate the with this conditional distribution, since that would require to compute a normalizing constant, which here is precisely .
Nevertheless, this conditional distribution can be estimated using the Metropolis-Hastings algorithm described in the Metropolis-Hastings algorithm for simulating the individual parameters and a practical proposal “close” to the optimal proposal can be derived. We can then expect to get a very accurate estimate with a relatively small Monte Carlo size M.
The mean and variance of the conditional distribution are estimated by Metropolis-Hastings for each individual i. Then, the are drawn with a noncentral student t-distribution:
where and are estimates of and , and is a sequence of i.i.d. random variables distributed with a Student’s t-distribution with degrees of freedom.
Remark: Even if is an unbiased estimator of , is a biased estimator of . Indeed, by Jensen’s inequality, we have :
However, the bias decreases as M increases and also if is close to . It is therefore highly recommended to use a proposal as close as possible to the conditional distribution , which means having to estimate this conditional distribution before estimating the log-likelihood (i.e run task “individual parameter” with “Cond. means and s.d.” option).
Advance settings for the log-likelihood
A t-distribution is used as proposal. The number of degrees of freedom of this distribution can be either fixed or optimized. In such a case, the default possible values are 2, 5, 10 and 20 degree of freedom. A distribution with a small number of degree of freedom (i.e. heavy tails) should be avoided in case of stiff ODE’s defined models. We recommend to set a degree of freedom at 5. It is accessible in Menu/Settings/Log-likelihood.
Log-likelihood by linearization
The likelihood of the nonlinear mixed effects model cannot be computed in a closed-form. An alternative is to approximate this likelihood by the likelihood of the Gaussian model deduced from the nonlinear mixed effects model after linearization of the function f (defining the structural model) around the predictions of the individual parameters .
Notice that the log-likelihood can not be computed by linearization for discrete outputs (categorical, count, etc.) nor for mixture models or when the posterior distribution have been estimated for some parameters with priors.
Best practices: When should I use the linearization and when should I use the importance sampling?
Firstly, it is only possible to use the linearization algorithm for the continuous data. In that case, this method is generally much faster than importance sampling method and also gives good estimates of the LL. The LL calculation by model linearization will generally be able to identify the main features of the model. More precise– and time-consuming – estimation procedures such as stochastic approximation and importance sampling will have very limited impact in terms of decisions for these most obvious features. Selection of the final model should instead use the unbiased estimator obtained by Monte Carlo. In the warfarin example, the evaluation of the log-likelihood (along with the AIC and BIC) is presented with the CPU time.
|Method||-2LL||AIC||BIC||CPU time [s]|
One can see that it the importance sampling is much more costly in computation time but the result is more precise.