 Purpose
 Examples
 Details
 Settings
 Generating predictive checks on an external data set
 Exporting VPC simulations
 Correcting the VPC for missing observations
Purpose
The VPC (Visual Predictive Check) offers an intuitive assessment of misspecification in structural, variability, and covariate models. The principle is to assess graphically whether simulations from a model of interest are able to reproduce both the central trend and variability in the observed data, when plotted versus an independent variable (typically time). It summarizes in the same graphic the structural and statistical models by computing several quantiles of the empirical distribution of the data after having regrouped them into bins over successive intervals.
More precisely, the goal is to compare the two following elements:
 Empirical percentiles: percentiles of the observed data, calculated either for each unique value of time, or pooled by adjacent time intervals (bins). By default, the 10th, 50th and 90th percentiles are displayed as green lines. These quantiles summarize the distribution of the observations.
 Theoretical percentiles: percentiles of simulated data are computed from multiple Monte Carlo simulations with the model of interest and the design structure of the original dataset (i.e., dosing, timing, and number of samples). For each simulation, the same percentiles are computed across the same bins as for empirical percentiles. Prediction intervals for each percentile are then estimated across all simulated data and displayed as colored areas (pink for the 50th percentile, blue for the 10th and 90th percentiles). By default, prediction intervals are computed with a level of 90%.
If the model is correct, the observed percentiles should be close to the predicted percentiles and remain within the corresponding prediction intervals.
Examples
VPCs vary slightly for different types of data. For joint models for multivariate outcomes, VPCs are available for each outcome.
 Continuous outcomes
warfarinPK_project (data = ‘warfarin_data.txt’, model = ‘lib:oral1_1cpt_TlagkaVCl.txt’)
In the following example, the parameters of a onecompartment model with delayed firstorder absorption and linear elimination are estimated on the warfarin dataset. A constant residual error model was used. The figure presents the VPC with the prediction intervals for the 10th, 50th and 90th percentiles. Outliers are highlighted with red dots and areas. Here the three quantiles appear closer together than the model would suggest, therefore the VPC suggests that a proportional component should be added to the error model.
For joint models for continuous PK and timetoevent data, VPCs are available for each type of data. However it is important to note that dropout events are not taken into account in the VPC corresponding to the continuous data. Therefore, in the case of nonrandom dropout events in the dataset, this can result in discrepancies between observed and simulated data and thus hamper the diagnosis value of the VPC. Correcting this bias would require to include the simulated dropout in VPC, as well as adapt the design structure to compensate observed dropouts, an approach that is problematic when the design structure is complex. More details on this approach are given here.
 Noncontinuous outcomes: count data and categorical data
VPCs for count data and categorical data compare the observed and predicted frequencies of the categorized data over time. The predicted frequency is associated with a blue prediction interval.
The following figure shows the VPC for a project with a continuous time Markov chain model and time varying transition rates.
 markov3b_project (data = ‘markov3b_data.txt’, model = ‘markov3b_model.txt’)
In addition to the categorization over time (binning on X), count data are also binned into groups of count values on the VPC (binning on Y). The number of bins and binning method can be set in Settings under “Y Bins”.
As an example, the VPC below corresponds to a project where a Poisson model is used for fitting the data. Observations are binned in 3 groups on the Y axis and 20 bins on the X axis.
 count1a_project (data = ‘count1_data.txt’, model = ‘count_library/poisson_mlxt.txt’)
 Timetoevent data
In case of timetoevent data, two visual predictive checks are available, survival function based on the KaplanMeier plot for exactly observed events and the Turnbull estimator for the interval censored data, and the mean number of events per individual using Turnbull estimation (see here and here for reference papers).
Details on the VPC for TTE generation in Monolix are presented here.
The example below shows these two figures, computed with a model for the survival of patients with advanced lung cancer from the Veterans’ Administration Lung Cancer study. Censored data has been selected and displayed on the KaplanMeier plot. Note that censored data also cause an over prediction bias in the VPC based on the mean number of events per individual, because censored individuals contribute to the prediction interval but not to the empirical curve.
Sometimes numerical errors can appear in the simulations used for the TTE VPC. For example, when simulating a joint model with tumor growth and survival over a long time scale, the simulated hazard for death can become too high for some individuals at high times. In Monolix2021R1, the appearance of NaNs in the VPC simulations does not stop the generation of the VPC. The simulated individuals with NaNs in their simulations are not used in the prediction interval, and the percentage of simulated individuals with NaNs is then displayed in the Warnings.
Details
 Binning criteria
Correctly defining the intervals (or bins) into which the data are grouped is crucial to construct a VPC that avoids distortion between the original and approximated distributions. Several strategies exist to segment the data: equalwidth binning, equalsize binning, and a leastsquares criterion. The number of bins can also be either set by the user, or automatically selected to obtain a good tradeoff. Indeed, a small number of bins leads to a poor approximation but a good estimation of the data’s distribution, while a large number of bins leads to a good approximation but poor estimation.
As an example, the VPCs below are computed on the PK model built for remifentanil pharmacokinetics, a dataset that involves a large variability in doses. The bins are delimited with vertical lines. The first VPC on the left is computed with 5 bins, the number automatically selected for this dataset. On the other hand, the second VPC on the right is computed with 15 bins. We notice that in this case the heterogeneity of the data results in a poor estimation of the data’s distribution. To keep a good estimation, a small number of bins is required, but the approximation then prevents from visualizing the kinetics in details. The absorption phase is for example not visible.
 Corrected predictions
As shown above, VPCs can be misleading if applied to data that include a large variability in dose and/or influential covariates, or that follow adaptive designs such as dose adjustments. The predictioncorrected VPC (pcVPC), with prediction correction, was developed to maintain the diagnosis value of a VPC in these cases. In each bin, the observed and simulated data are normalized based on the typical population prediction for the median time in the bin. This removes the variability coming from binning across independent variables.
The example below shows the pcVPC computed on the PK model built for remifentanil pharmacokinetics with 15 bins: the figure now gives a good estimation of the data’s distribution, including the absorption phase.
 Stratification
When possible, another useful approach to deal with heterogeneous data can be to split the VPC into groups of subjects that are more homogeneous. As an example, the VPCs below are computed again on the PK model built for remifentanil pharmacokinetics, with 15 bins, but the data was first split by a categorical covariate that characterizes groups of similar doses.
 VPC based on time after last dose (continuous data only)
Monolix2021R1 provides an option in the Display panel to display the VPC with “Time after last dose” on the Xaxis. The images below show as an example the result on the demo project multidose_project.mlxtran (Monolix demo folder 6.3), with observed data overlaid on the VPC, “Time” as Xaxis on the left image, and “Time after last dose” on the Xaxis on the right. Whatever the option selected in the interface, the exported charts data for the VPC now include two columns for time and time after last dose. Doses with amount=0 are handled as the others. For observations before the first dose (or if no doses at all), time remains unchanged (i.e time after last dose = time). Finally, all administration ids are handled together (it is not possible to define time after dose for adm id = 2 only for instance).
Settings
 General: Add/remove legend or grid
 Subplots (for TTE data)
 Add/remove plot for survival function (KapanMeier plot) or plot for mean number of events per individual
 Add/remove plot for survival function (KapanMeier plot) or plot for mean number of events per individual
 Display
 Observed data
 Observed data: Add/remove observed data.
 BLQ: Add/remove BLQ data if present.
 Use BLQ: Choose to use BLQ data or to ignore it to compute the VPC. BLQ data can be simulated, or can be equal to the limit of quantification (LOQ). The latter case induces strong bias .
 Empirical percentiles: Add/remove empirical percentiles for the 10%, 50% and 90% quantiles.
 Predicted percentiles: Add/remove theoretical percentiles for the 10% and 90% quantiles.
 Prediction interval: Add/remove prediction intervals given by the model for the 10% and 90% quantiles (in blue) and the 50% quantile (in pink).
 Set interpercentile level and higher percentile for prediction intervals (for continuous data by default the level is 90 and the higher percentile is 90%), or number of bands for TTE data
 Outliers
 Dots: Add/remove red dots indicating empirical percentiles that are outside prediction intervals
 Areas: Add/remove red areas indicating empirical percentiles that are outside prediction intervals
 Calculations
 Observed data


 Corrected predictions: compute the pcVPC using Uppsala prediction correction (see details above)
 Linear interpolation: Set piece wise display for prediction intervals (by default the display is linear)
 Time after last dose: Use time after last dose instead of time on the Xaxis.

 Bins – for categorical data, X Bins and DV Bins (for Y axis) can be specified
 Bin limits: Add/remove vertical lines on the scatter plots to indicate the bins.
 Binning criteria: Choose the bining criteria among equal width (default), equal size or leastsquares
 Number of bins: Choose a fixed number of bins or a range for automatic selection, and a range for the number of data points per bin.
All colors, points and lines can be modified by the user.
Generating predictive checks on an external data set
This video shows how to generate an external VPC in Monolix: a VPC that compares the simulations based on a population model estimated on a first data set to a second data set, for example to check whether a population model estimated on a single dose study is also valid on a new multiple dose study for the same molecule.
Exporting VPC simulations
When exporting the charts data, by default the charts data for the VPC include the observed data and predicted percentiles, but not the detailed simulations, which are usually quite large. Starting from the 2020 version, the user can nonetheless choose to include the VPC simulations by clicking on “Export > Export VPC simulations” in the application menu:
The simulated values are saved in <result folder>/ChartsData/VisualPredictiveCheck/XXX_simulations.txt. They can be used to replot the VPC in R for instance.
In Monolix2021 it is also possible to include systematically the VPC simulations in the exported charts data with an option in the Preferences:
Correcting the VPC for missing observations
Missing observations can cause a bias in the VPC and hamper its diagnosis value. This page discusses why missing censored data or censored data replaced by the LOQ can cause a bias in the VPC, and how Monolix handles censored data to prevent this bias. It also explains the bias resulting from nonrandom dropout, and how this can be corrected with Simulx.