Bayesian estimation


Bayesian estimation allows to take into account prior information in the estimation of parameters. It is called in Monolix Maximum A Posteriori estimation, and it corresponds to a penalized maximum likelihood estimation, based on a prior distribution defined for a parameter. The weight of the prior in the estimation is given by the standard deviation of the prior distribution.


Objectives: learn how to combine maximum likelihood estimation and Bayesian estimation of the population parameters.

Projects: theobayes1_project, theobayes2_project,


The Bayesian approach considers the vector of population parameters \(\theta\) as a random vector with a prior distribution \(\pi_\theta\). We can then define the *posterior distribution* of \(\theta\):

\(\begin{aligned} p(\theta | y ) &= \frac{\pi_\theta( \theta )p(y | \theta )}{p(y)} \\ &= \frac{\pi_\theta( \theta ) \int p(y,\psi |\theta) \, d \psi}{p(y)} . \end{aligned} \)

We can estimate this conditional distribution and derive statistics (posterior mean, standard deviation, quantiles, etc.) and the so-called maximum a posteriori (MAP) estimate of \(\theta\):

\(\begin{aligned} \hat{\theta}^{\rm MAP} &=\text{arg~max}_{\theta} p(\theta | y ) \\ &=\text{arg~max}_{\theta} \left\{ {\cal LL}_y(\theta) + \log( \pi_\theta( \theta ) ) \right\} . \end{aligned} \)

The MAP estimate maximizes a penalized version of the observed likelihood. In other words, MAP estimation is the same as penalized maximum likelihood estimation. Suppose for instance that \(\theta\) is a scalar parameter and the prior is a normal distribution with mean \(\theta_0\) and variance \(\gamma^2\). Then, the MAP estimate is the solution of the following minimization problem:

\(\hat{\theta}^{\rm MAP} =\text{arg~min}_{\theta} \left\{ -2{\cal LL}_y(\theta) + \frac{1}{\gamma^2}(\theta – \theta_0)^2 \right\} .\)

This is a trade-off between the MLE which minimizes the deviance, \(-2{\cal LL}_y(\theta)\), and \(\theta_0\) which minimizes \((\theta – \theta_0)^2\). The weight given to the prior directly depends on the variance of the prior distribution: the smaller \(\gamma^2\) is, the closer to \(\theta_0\) the MAP is. In the limiting case, \(\gamma^2=0\); this means that \(\theta\) is fixed at \(\theta_0\) and no longer needs to be estimated. Both the Bayesian and frequentist approaches have their supporters and detractors. But rather than being dogmatic and following the same rule-book every time, we need to be pragmatic and ask the right methodological questions when confronted with a new problem.
All things considered, the problem comes down to knowing whether the data contains sufficient information to answer a given question, and whether some other information may be available to help answer it. This is the essence of the art of modeling: find the right compromise between the confidence we have in the data and our prior knowledge of the problem. Each problem is different and requires a specific approach. For instance, if all the patients in a clinical trial have essentially the same weight, it is pointless to estimate a relationship between weight and the model’s PK parameters using the trial data. A modeler would be better served trying to use prior information based on physiological knowledge rather than just some statistical criterion.
Generally speaking, if prior information is available it should be used, on the condition of course that it is relevant. For continuous data for example, what does putting a prior on the residual error model’s parameters mean in reality? A reasoned statistical approach consists of including prior information only for certain parameters (those for which we have real prior information) and having confidence in the data for the others. Monolix allows this hybrid approach which reconciles the Bayesian and frequentist approaches. A given parameter can be

  • a fixed constant if we have absolute confidence in its value or the data does not allow it to be estimated, essentially due to lack of identifiability.
  • estimated by maximum likelihood, either because we have great confidence in the data or no information on the parameter.
  • estimated by introducing a prior and calculating the MAP estimate or estimating the posterior distribution.

Computing the Maximum a posteriori (MAP) estimate

  • theobayes1_project (data = ‘theophylline_data.txt’ , model = ‘lib:oral1_1cpt_kaVCl.txt’)

We want to introduce a prior distribution for \(ka_{\rm pop}\) in this example. Click on the option button

and select Maximum A Poteriori Estimation

We propose a typical value, here 2 and standard deviation 0.1 for \(ka_{\rm pop}\) and to compute the MAP estimate for \(ka_{\rm pop}\). The distribution of the MAP is inevitably the same as the the one used for the parameter.
The parameter is then colored in purple.

The MAP estimate of \(ka_{\rm pop}\) is a penalized maximum likelihood estimate:

Fixing the value of a parameter

  • theobayes2_project (data = ‘theophylline_data.txt’ , model = ‘lib:oral1_1cpt_kaVCl.txt’)

We can combine different strategies for the population parameters: Bayesian estimation for \(ka_{\rm pop}\), fixed value for \(V_{\rm pop}\) and maximum likelihood estimation for \(Cl_{\rm pop}\), for instance.


  • The parameter \(V_{\rm pop}\) is fixed and then colored in red.
  • \(V_{\rm pop}\) is not estimated (it’s s.e. is not computed) but the standard deviation \(\omega_{V}\) is estimated as usual.