This case study is part of a series on time-to-event data and models:

- Part 1: Introduction to TTE data and library of models
- Part 2: Case study on veteran lung cancer data set
- Part 3: Case study on NCCTG lung cancer data set

In this case study, we develop a model for the survival of patients with advanced lung cancer. The effect of several covariates, including a novel chemotherapy, is investigated.

- Introduction
- Data set visualization
- Modeling with Monolix
- Simulations with Simulx
- Conclusion
- Downloads

## Introduction

In a study conducted by the US Veterans Administration, **male patients with advanced inoperable lung cancer were given either a standard therapy or a test chemotherapy. Time to death was recorded** for 137 patients, while 9 left the study before death. Various covariates were also documented for each patient.

**The primary goal is to assess if the test chemotherapy is beneficial.** Secondary goals include the analysis of covariates as prognostic variables.

This data set has been initially presented and analyzed in:

## Data set visualization

The data set includes data for 137 patients, 9 of them are right censored. According to the data set formatting guidelines, the data set includes for each patient a line indicating the start time (time=0 and observation Y=0), and a line indicating either the time of death (Y=1) or the time at which the patient has left the study (Y=0).

In addition the following covariates were recorded:

**Treatment**(“trt” column): “standard” or “test” (tested new chemotherapy)**Histological type of the tumor**(“celltype” column): “squamous”, “smallcell”, “adeno” or “large”**Karnofsky score**(“karno” column): performance score that describes the overall patients status at the beginning of the study. The scale goes from 0 (dead) to 100 (completely healthy). More details about the scale can be found here.**Time between diagnosis and start of the study**(column “diagtime”): in months**Age**(column “age”): in years**Prior therapy**(column “priortherapy”): indicates if the patient has received another therapy before the current one

A snapshot of the data set in shown below:

**We first start with a visualization of the data set using Datxplore**. We open Datxplore, click on `Project > New project/data`

and `Browse`

to select the data set. The columns are assigned in the following way:

and load the data (the covariates must be assigned to CONTINUOUS COVARIATE for continuous covariates or CATEGORICAL COVARIATE for categorical covariates). Before using the data visualization feature of Monolix, we must indicate that the data is of type TTE (rather than continuous). This is done via the choice of a structural model, that defines an event as output. For the moment, we can just select any model from the TTE model library via the model selection window that opens when clicking on “model file”. We can now click on the** DATA VIEWER **in the DATA frame.The** Kaplan-Meier (KM) estimate of the survival curve** as well as the mean number of events curve can be seen. In the “Settings”, we can choose to focus only on the KM survival curve and **display the censored data**.

In the “Stratify” menu, we can **split the KM curve according to categorical covariates, or categorized continuous covariates** (groups can be changed with the “modify” button). Stratifying by treatment trt using the “color” option leads to the following figure:

At first sight, it seems that there is no clear difference in survival for the standard treatment group and the chemotherapy treatment group. For the Karnofsky score, we create two groups (Group1= karno<50, and Group2= karno>50) and stratify:

Here it seems that patients with higher Karnofsky score (higher general health status) have a tendency to survive longer. We can go similarly through the other covariates.

## Modeling with Monolix

We now would like to **develop a model for this data set, in order to analyze the influence of the covariates**, in particular the treatment. Within the MonolixSuite, the model must be fully specified via its hazard function (no non-parametric or semi-parametric approaches).

### Structural model

We start with the specification of the structural model. In the TTE introduction, we have presented a library of common TTE models (defined via their hazard functions), and have detailed the typical shape of the resulting survival curve. **Using the KM curve visualization and the overview of the library models presented in the TTE introduction, it seems that an exponential model could be appropriate**. We thus choose the file “exponential_model_singleEvent.txt” (because death happens only once per individual) from the library.

**We next need to define the distribution of the individual parameters**. Most of the time in survival analysis, one assume that the individual parameters are influenced by the covariates but that all individual with the same covariates have the same individual parameters and thus the same hazard function (but different times of death). On the opposite, **we can also assume a random effect for the individual parameters (often called frailty models), which permits to capture heterogeneity between individuals**, which cannot be explained by the covariates. Because Monolix estimation algorithms and graphics are most powerful when parameters include random effects, we will start with this approach.

**The model contains a single parameter, the characteristic time Te**. As this parameter should remain positive, we keep the default log-normal distribution. **An initial value for this parameter can be chosen with help of the library overview, for instance 100 for this data set**.

**The model being fully defined, we can launch the estimation tasks**: estimation of the population parameters, estimation of the standard errors via the Fisher Information Matrix, estimation of the log-likelihood and generation of the graphics (individual parameters have been skipped because no individual fits plots are possible with TTE data). The Te_pop parameter very rapidly converges towards a random walk around the maximum likelihood estimate (MLE) in the exploratory phase of SAEM (before the red line, see figure below), and then converges to the MLE in the smoothing phase (after the red line). The estimated value (visible using the “Last results” button) is 109, close to our initial estimate. The standard deviation of the random effects (omega_Te parameter) converges to 0.63, which corresponds to a coefficient of variation of \( \textrm{CV} = \sqrt{e^{{\omega_{T_e}}^2} – 1} = 50 % \). **The large inter-individual variability probably reflects that sub-populations within the data set have different Te values**. Considering covariates on the Te parameter to reduced the random effects is the topic of the next section.

To assess if the exponential model was a good choice, **we can display the Time-to-Event figure** (click on “Select plots” and in the pop-up window, on “Time to event data”). The figure displays the empirical KM curve. By going to Settings, **we can overlay the 90% prediction interval calculated from model simulations**. The figure is then interpreted similarly to a VPC. **Here the empirical KM curve lies within the prediction interval, we conclude that there is no model mis-specification**.

### Covariate search

The next step is to assess if the hazard is significantly different for sub-populations (for instance with the standard treatment or the chemotherapy). **For model selection several statistical tools can be used, such as information criteria (AIC and BIC) or hypothesis tests (Wald test and likelihood ratio test)**. In this case study, we will use the Wald test.

Covariates can be added by clicking on the matrix in the “Covariate model” section:

The details of the resulting equations can be seen by clicking on the white paperclip button. If we add the **categorical covariate trt** on parameter Te, we obtain:

$$T_{e,i} = \begin{cases} T_{e,\textrm{pop}}e^{\eta_i} & \text{if} \quad \textrm{trt}=0\textrm{ (reference)} \\ T_{e,\textrm{pop}}e^{\beta_{\textrm{trt}}}e^{\eta_i} & \text{if} \quad \textrm{trt}=1 \end{cases} $$

If we instead add **the continuous covariate karno** on parameter Te, we obtain:

$$T_{e,i}=T_{e,\textrm{pop}}e^{\beta_{\textrm{karno}}\times\textrm{karno}}e^{\eta_i}$$

Note that this parameterization corresponds to the **typical form of a Cox proportional model**. Via transformation of the covariates, other forms are also possible.

The parameterization using betas permits to directly perform statistical tests on the beta parameters. In particular, **the Wald test tests the following null hypothesis**:

$$H_0:\quad \textrm{”}\beta=0\textrm{”} \quad \textrm{versus} \quad H_1:\quad \textrm{”}\beta\neq 0\textrm{”}$$

The associated p-value is reported in the output file. **If the p-value is smaller than a threshold (usually 0.05), we can consider that ( beta ) is significantly different from zero**.

We create 8 models, each having one covariate on the Te parameter. Here are the p-value results for each covariate (each model):

The covariates karno and celltype appear as significative, while all others not. In particular, **given the data, the new chemotherapy does not significantly lowers the risk, compared to the standard treatment**.

Next, we can try a model with both karno and celltype. Because the difference between the “smallcell” cell type and the “adeno” cell type (used as reference) is not significative, **we can choose to group these two into one category**. Using the “Transform” button, we group “adeno” and “smallcell” into the G1 reference group.

The results are the following:

From the estimated beta values, we can easily calculate that **the risk of death is around exp(1.02)=2.8 times higher for patients with squamous cell types, compared to adeno or small cell types**. **In the graphics, the 95% prediction interval for the survival can be stratified** by cell type and/or karno score categories, to verify the agreement between model and data:

No model mis-specification can be detected. **We thus consider this model as our final model.**

The final model is:

$$h_{i}(t)=\begin{cases} \frac{1}{T_{e,\textrm{pop, adeno/smallcell}}e^{\beta\times \textrm{karno}}e^{\eta_i} }& \text{if} \quad \textrm{celltype}=\textrm{adeno or smallcell (G_adeno_smallcell)} \\ \frac{1}{T_{e,textrm{pop, large}}e^{\beta\times \textrm{karno}}e^{\eta_i} }& \text{if} \quad \textrm{celltype}=\textrm{large (G_large)}\\ \frac{1}{T_{e,\textrm{pop, squamous}}e^{\beta\times \textrm{karno}}e^{\eta_i}} & \text{if} \quad \textrm{celltype}=\textrm{squamous (G_squamous )} \end{cases} $$

For a given celltype and a given karno score, we can easily calculate analytically the typical hazard and the associated exponential model survival (i.e if (h(t)=lambda) then (S(t)=e^{-lambda t})). **From the survival function, we calculate the probability to survive at least 3 months (90 days), 6 months (180 days) or 1 years (365 days), depending on the cell type and karno score**:

The probability density function of the death event can also be calculated for any given karno score and cell type ( ( f(t)=h(t)times S(t) ) ), giving **the probability of death over time**. For each displayed subpopulation, the plot **also represents the distribution of death times in the subpopulation**.

## Simulations using Simulx

Now that we have developed a parametric model for the survival of patients with advanced lung cancer, **we can use the model to explore the expected survival for new cohorts of patients. To perform simulations, we use the simulx function of the R package mlxR.** Using the Monolix project as input, we can either simulate the same population as in the original data set, or define new populations.

**We first simulate the same population as in the original data set**. Because, for a given hazard, time of death is an random variable, the Kaplan-Meier survival curve will be different compared to the original data. The simulation is done via the simulx function which returns a R object containing the result of the simulation (time of death for each individual). We can then pass this object to kmplotmlx to obtain the Kaplan-meier survival curve:

# loading the mlxR library library(mlxR) # defining the path to the mlxtran project file project.file <- "./monolixproject/project_veteran_3_celltype_karno.mlxtran" # calling simulx, giving the path to the project as argument res <- simulx(project = project.file) # plotting the KM survival curve kmplotmlx(res$Event) + xlab("Time (days)")

We obtain the following Kaplan-Meier survival curve, similar (but different) from that of the original data set:

**We next simulate two new patients cohorts (with different covariates compared to the original data set)**. The first cohort are patients with a quite good karno score, around 70, and mostly with squamous cell types (70% squamous, 10% adeno, 10% smallcell, 10% large). The second cohort are patients having mostly adeno or smallcell celltypes (40% adeno, 40% smallcell, 10% large, 10% squamous) and a low karno score, around 40.

**To simulate these two cohorts, we create two groups, each defined via a data frame of the individuals covariates, and pass them as simulx input argument.** For instance for group 1:

# covariate data frame for group 1, with column id and one column per covariate cov1 <- data.frame(id=1:100, karno = rnorm(100,mean=70, sd=5), celltype = c(rep("squamous",70),rep("large",10),rep("smallcell",10),rep("adeno",10)), diagtime = NaN, #unused in model but must be present age = NaN, #unused in model but must be present trt = c(rep("test",50),rep("standard",50)), #unused in model but must be present priortherapy = c(rep("yes",50),rep("no",50))) #unused in model but must be present #============= calling simulx for group 1 res1 <- simulx(project = project.file, parameter = cov1)

After having put together the results for group 1 and group 2, we obtain the following prediction:

Because the time of death is a random variable, several simulations of the same population will lead to different survival curves. **We can assess the uncertainty of the survival curve by doing replicates**. This can be done very easily by adding an argument nrep:

res1 <- simulx(project = project.file, parameter = cov1, nrep = 100)

We then obtain a prediction interval for the two survival curves:

## Conclusion

**Monolix permits an efficient parametric analysis of this TTE data set**. The risk of death for male patients with advanced inoperable lung cancer appears to be constant over time (exponential distribution of death times). Prognostic factors are the histological type of the tumor (cell type) and the overall health Karnofsky score. Unfortunately, no improvement with the new treatment can be shown, given the data. **The parametric nature of the model permits to easily predict the survival probability for a typical subject with a given celltype and a given karno score.**

**Using Simulx, the developed parametric model can be used for more advanced predictions**. Indeed, we can define new populations with a complex mix of covariates (different celltypes and different karno scores) and simulate the expected survival for this population. The uncertainty of the prediction can easily be assessed via replicates.

## Downloads

The final and intermediates Monolix projects can be downloaded here.

The simulx R script can be downloaded here.