Select Page

Handling censored (BLQ) data


Objectives: learn how to handle easily and properly censored data, i.e. data below (resp. above) a lower (resp.upper) limit of quantification (LOQ) or below a limit of detection (LOD).


Projects: censored1log_project, censored1_project, censored2_project, censored3_project, censored4_project


Introduction

Censoring occurs when the value of a measurement or observation is only partially known. For continuous data measurements in the longitudinal context, censoring refers to the values of the measurements, not the times at which they were taken. For example, the lower limit of detection (LLOD) is the lowest quantity of a substance that can be distinguished from its absence. Therefore, any time the quantity is below the LLOD, the “observation” is not a measurement but the information that the measured quantity is less than the LLOD. Similarly, in longitudinal studies of viral kinetics, measurements of the viral load below a certain limit, referred to as the lower limit of quantification (LLOQ), are so low that their reliability is considered suspect. A measuring device can also have an upper limit of quantification (ULOQ) such that any value above this limit cannot be measured and reported.
As hinted above, censored values are not typically reported as a number, but their existence is known, as well as the type of censoring. Thus, the observation y^{(r)}_{ij} (i.e., what is reported) is the measurement y_{ij} if not censored, and the type of censoring otherwise.
We usually distinguish between three types of censoring: left, right and interval. In each case, the SAEM algorithm implemented in Monolix properly computes the maximum likelihood estimate of the population parameters, combining all the information provided by censored and non censored data.

Theory

In the presence of censored data, the conditional density function needs to be computed carefully. To cover all three types of censoring (left, right, interval), let I_{ij} be the (finite or infinite) censoring interval existing for individual i at time t_{ij}. Then,

$$\displaystyle p(y^{(r)}|\psi)=\prod_{i=1}^{N}\prod_{j=1}^{n_i}p(y_{ij}|\psi_i)^{1_{y_{ij}\notin I_{ij}}}\mathbb{P}(y_{ij}\in I_{ij}|\psi_i)^{1_{y_{ij}\in I_{ij}}}$$

where

$$\displaystyle \mathbb{P}(y_{ij}\in I_{ij}|\psi_i)=\int_{I_{ij}} p_{y_{ij}|\psi_i} (u|\psi_i)du$$

We see that if y_{ij} is not censored (i.e. 1_{y_{ij}\notin I_{ij}}=1), its contribution to the likelihood is the usual p(y_{ij}|\psi_i), whereas if it is censored, the contribution is \mathbb{P}(y_{ij}\in I_{ij}|\psi_i).
For the calculation of the likelihood, this is equivalent to the M3 method in NONMEM when only the CENSORING column is given, and to the M4 method when both a CENSORING column and a LIMIT column are given.

Censoring definition in a data set

To define that a measurement is censored, you have to

  • Set your censored measurement in the OBSERVATION column.
  • Have a CENSORING column and put 1 or – 1 depending if this is a lower or an upper limit.
  • Optionally, have LIMIT column to set the other limit.

If the measurement is not censored, just put 0 in the CENSORING column and the regular value in the OBSERVATION column. Examples are provided below and here.

PK data below a lower limit of quantification

Left censored data

  • censored1log_project (data = ‘censored1log_data.txt’, model = ‘pklog_model.txt’)

PK data are log-concentration in this example. The limit of quantification of 1.8 mg/l for concentrations becomes log(1.8)=0.588 for log-concentrations. Column of observations (Y) contains either the LLOQ for data below the limit of quantification (BLQ data) or the measured log-concentrations for non BLQ data. Furthermore, Monolix uses an additional column CENSORING to indicate if an observation is left censored (CENS=1) or not (CENS=0). In this example, subject 1 has two BLQ data at time 24h and 30h (the measured log-concentrations were below 0.588 at these times):

The plot of individual fits displays BLQ (red band) and non BLQ data (blue dots) together with the predicted log-concentrations (purpleline) on the whole time interval:

Notice that the band goes from .8 to -Infinity as no bound has been specified (no LIMIT column was proposed).
For diagnostic plots such as VPC, residuals of observations versus predictions, Monolix proposes to sample the BLQ data from the conditional distribution

$$p(y^{BLQ} | y^{non BLQ}, \hat{\psi}, \hat{\theta})$$

where \hat{\theta} and \hat{\psi} are the estimated population and individual parameters. This is the most efficient way to take into account the complete information provided by the data and the model for diagnostic plots such as VPCs:


A strong bias appears if LLOQ is used instead for the BLQ data (if you choose LOQ instead of simulated in the display frame of the settings) :

Notice that ignoring the BLQ data entails a loss of information as can be seen below (if you choose no in the Use BLQ toggle):

As can be seen below, imputed BLQ data is also used for residuals (IWRES on the left) and for observations versus predictions (on the right)

More on these diagnostic plots

Impact of the BLQ in residuals and observations versus predictions graphics

A strong bias appears if LLOQ is used instead for the BLQ data for these two diagnostic plots:

while ignoring the BLQ data entails a loss of information:

BLQ predictive checks

Diagnostic plot BLQ plots the cumulative fraction of BLQ data (green line) with a 90% prediction interval

Interval censored data

  • censored1_project (data = ‘censored1_data.txt’, model = ‘lib:oral1_1cpt_kaVk.txt’)

We use the original concentrations in this project. Then, BLQ data should be treated as interval censored data since a concentration is know to be positive. In other word, a data reported as BLQ data means that the (non reported) measured concentration is between 0 and 1.8mg/l. The value in the observation column 1.8 indicates the value, the value in the CENSORING column indicates that the value in the observation column is the upper bound.  An additional column LIMIT reports the lower limit of the censored interval (0 in this example):


Remarks

  • if this column is missing, then BLQ data is assumed to be left-censored data that can take any positive and negative value below LLOQ.
  • the value of the limit can vary between observations of the same subject.

Monolix will use this additional information for properly estimating the parameters of the model and imputing the BLQ data for the diagnostic plots.

Plot of individual fits now displays LLOD at 1.8 with a red band when a PK data is censored. We see that the band lower limit is at 0 as defined in the limit column.

PK data below a lower limit of quantification or below a limit of detection

  • censored2_project (data = ‘censored2_data.txt’, model = ‘lib:oral1_1cpt_kaVk.txt’)

Plot of individual fits now displays LLOQ or LLOD with a red band when a PK data is censored. We see that the band lower limits are depending on the observation.

PK data below a lower limit of quantification and PD data above an upper limit of quantification

  • censored3_project (data = ‘censored3_data.txt’, model = ‘pkpd_model.txt’)

We work with PK and PD data in this project and assume that the PD data may be right censored and that the upper limit of quantification is ULOQ=90. We use CENS=-1 to indicate that an observation is right censored. In such case, the PD data can take any value above the upper limit reported in column Y (here the YTYPE column of type OBSERVED ID defines the type of observation, YTYPE=1 and YTYPE=2 are used respectively for PK and PD data):

Plot of individual fits for the PD data now displays ULOQ and the predicted PD profile:

We can display the cumulative fraction of censored data both for the PK and the PD data (on the left and right respectively):

Combination of interval censored PK and PD data

  • censored4_project (data = ‘censored4_data.txt’, model = ‘pkpd_model.txt’)

We assume in this example

  • 2 different censoring intervals(0,1) and (1.2, 1.8) for the PK,
  • a censoring interval (80,90) and right censoring (>90) for the PD.

Combining columns CENS, LIMIT and Y allow to combine efficiently these different censoring processes:

This coding of the data means that, for subject 1,

  • PK data is between 0 and 1 at time 30h (second blue frame),
  • PK data is between 1.2 and 1.8 at times 0.5h and 24h (first blue frame for time .5h),
  • PD data is between 80 and 90 at times 12h and 16h (second green frame for time 12h),
  • PD data is above 90 at times 4h and 8h (first green frame for time 4h).

Plot of individual fits for the PK and the PD data displays the different limits of these censoring intervals (PK on the left and PD on the right):

Other diagnostic plots, such as the plot of observations versus predictions, adequately use imputed censored PK and PD data:

Case studies

  • 8.case_studies/hiv_project (data = ‘hiv_data.txt’, model = ‘hivLatent_model.txt’)
  • 8.case_studies/hcv_project (data = ‘hcv_data.txt’, model = ‘hcvNeumann98_model_latent.txt’)