Visual Predictive Check


The VPC (Visual Predictive Check) offers an intuitive assessment of misspecification in structural, variability, and covariate models. The principle is to assess graphically whether simulations from a model of interest are able to reproduce both the central trend and variability in the observed data, when plotted versus an independent variable (typically time). It summarizes in the same graphic the structural and statistical models by computing several quantiles of the empirical distribution of the data after having regrouped them into bins over successive intervals.
More precisely, the goal is to compare the two following elements:

  • Empirical percentiles: percentiles of the observed data, calculated either for each unique value of time, or pooled by adjacent time intervals (bins). By default, the 10th, 50th and 90th percentiles are displayed as green lines. These quantiles summarize the distribution of the observations.
  • Theoretical percentiles: percentiles of simulated data are computed from multiple Monte Carlo simulations with the model of interest and the design structure of the original dataset (i.e., dosing, timing, and number of samples). For each simulation, the same percentiles are computed across the same bins as for empirical percentiles. Prediction intervals for each percentile are then estimated across all simulated data and displayed as colored areas (pink for the 50th percentile, blue for the 10th and 90th percentiles). By default, prediction intervals are computed with a level of 90%.

If the model is correct, the observed percentiles should be close to the predicted percentiles and remain within the corresponding prediction intervals.




VPCs vary slightly for different types of data. For joint models for multivariate outcomes, VPCs are available for each outcome.

  • Continuous outcomes

warfarinPK_project (data = ‘warfarin_data.txt’, model = ‘lib:oral1_1cpt_TlagkaVCl.txt’)

In the following example, the parameters of a one-compartment model with delayed first-order absorption and linear elimination are estimated on the warfarin dataset. A constant residual error model was used. The figure presents the VPC with the prediction intervals for the 10th, 50th and 90th percentiles. Outliers are highlighted with red dots and areas. Here the three quantiles appear closer together than the model would suggest, therefore the VPC suggests that a proportional component should be added to the error model.

For joint models for continuous PK and time-to-event data, VPCs are available for each type of data. However it is important to note that dropout events are not taken into account in the VPC corresponding to the continuous data. Therefore, in the case of non-random dropout events in the dataset, this can result in discrepancies between observed and simulated data and thus hamper the diagnosis value of the VPC. Correcting this bias would require to include the simulated dropout in VPC, as well as adapt the design structure to compensate observed dropouts, an approach that is problematic when the design structure is complex. More details on this approach are given here.

  • Non-continuous outcomes: count data and categorical data

VPCs for count data and categorical data compare the observed and predicted frequencies of the categorized data over time. The predicted frequency is associated with a blue prediction interval.

The following figure shows the VPC for a project with a continuous time Markov chain model and time varying transition rates.

  • markov3b_project (data = ‘markov3b_data.txt’, model = ‘markov3b_model.txt’)

In addition to the categorization over time (binning on X), count data are also binned into groups of count values on the VPC (binning on Y). The number of bins and binning method can be set in Settings under “Y Bins”.
As an example, the VPC below corresponds to a project where a Poisson model is used for fitting the data. Observations are binned in 3 groups on the Y axis and 20 bins on the X axis.

  • count1a_project (data = ‘count1_data.txt’, model = ‘count_library/poisson_mlxt.txt’)


  • Time-to-event data

In case of time-to-event data, two visual predictive checks are available, survival function based on the Kaplan-Meier plot for exactly observed events and the Turnbull estimator for the interval censored data, and the mean number of events per individual using Turnbull estimation (see here and here for reference papers).

Details on the VPC for TTE generation in Monolix are presented here.

The example below shows these two figures, computed with a model for the survival of patients with advanced lung cancer from the Veterans’ Administration Lung Cancer study. Censored data has been selected and displayed on the Kaplan-Meier plot. Note that censored data also cause an over prediction bias in the VPC based on the mean number of events per individual, because censored individuals contribute to the prediction interval but not to the empirical curve.



  • Binning criteria

Correctly defining the intervals (or bins) into which the data are grouped is crucial to construct a VPC that avoids distortion between the original and approximated distributions. Several strategies exist to segment the data: equal-width binning, equal-size binning, and a least-squares criterion. The number of bins can also be either set by the user, or automatically selected to obtain a good tradeoff. Indeed, a small number of bins leads to a poor approximation but a good estimation of the data’s distribution, while a large number of bins leads to a good approximation but poor estimation.

As an example, the VPCs below are computed on the PK model built for remifentanil pharmacokinetics, a dataset that involves a large variability in doses. The bins are delimited with vertical lines. The first VPC on the left is computed with 5 bins, the number automatically selected for this dataset. On the other hand, the second VPC on the right is computed with 15 bins. We notice that in this case the heterogeneity of the data results in a poor estimation of the data’s distribution. To keep a good estimation, a small number of bins is required, but the approximation then prevents from visualizing the kinetics in details. The absorption phase is for example not visible.


  • Corrected predictions

As shown above, VPCs can be misleading if applied to data that include a large variability in dose and/or influential covariates, or that follow adaptive designs such as dose adjustments. The prediction-corrected VPC (pcVPC), with prediction correction, was developed to maintain the diagnosis value of a VPC in these cases. In each bin, the observed and simulated data are normalized based on the typical population prediction for the median time in the bin. This removes the variability coming from binning across independent variables.
The example below shows the pcVPC computed on the PK model built for remifentanil pharmacokinetics with 15 bins: the figure now gives a good estimation of the data’s distribution, including the absorption phase.


  • Stratification

When possible, another useful approach to deal with heterogeneous data can be to split the VPC into groups of subjects that are more homogeneous. As an example, the VPCs below are computed again on the PK model built for remifentanil pharmacokinetics, with 15 bins, but the data was first split by a categorical covariate that characterizes groups of similar doses.


  • General: Add/remove legend or grid
  • Subplots (for TTE data)
    • Add/remove plot for survival function (Kapan-Meier plot) or plot for mean number of events per individual
  • Display
    • Observed data
      • Observed data: Add/remove observed data.
      • BLQ: Add/remove BLQ data if present.
      • Use BLQ: Choose to use BLQ data or to ignore it to compute the VPC. BLQ data can be simulated, or can be equal to the limit of quantification (LOQ). The latter case induces strong bias .
    • Empirical percentiles: Add/remove empirical percentiles for the 10%, 50% and 90% quantiles.
    • Predicted percentiles: Add/remove theoretical percentiles for the 10% and 90% quantiles.
    • Prediction interval: Add/remove prediction intervals given by the model for the 10% and 90% quantiles (in blue) and the 50% quantile (in pink).
      • Set interpercentile level and higher percentile for prediction intervals (for continuous data by default the level is 90 and the higher percentile is 90%), or number of bands for TTE data
    • Outliers
      • Dots: Add/remove red dots indicating empirical percentiles that are outside prediction intervals
      • Areas: Add/remove red areas indicating empirical percentiles that are outside prediction intervals
    • Calculations

      • Corrected predictions: compute the pcVPC using Uppsala prediction correction (see details above)
      • Set piece wise display for prediction intervals (by default the display is linear)
  • Bins – for categorical data, X Bins and DV Bins (for Y axis) can be specified

    • Bin limits: Add/remove vertical lines on the scatter plots to indicate the bins.
    • Binning criteria: Choose the bining criteria among equal width (default), equal size or least-squares
    • Number of bins: Choose a fixed number of bins or a range for automatic selection, and a range for the number of data points per bin.

All colors, points and lines can be modified by the user.


Generating predictive checks on an external data set

This video shows how to generate an external VPC in Monolix: a VPC that compares the simulations based on a population model estimated on a first data set to a second data set, for example to check whether a population model estimated on a single dose study is also valid on a new multiple dose study for the same molecule.