# Time-to-event data models

Objectives: learn how to implement a model for (repeated) time-to-event data with different censoring processes.

Projects: tte1_project, tte2_project, tte3_project, tte4_project, rtteWeibull_project, rtteWeibullCount_project

## Introduction

Here, observations are the “times at which events occur”. An event may be one-off (e.g., death, hardware failure) or repeated (e.g., epileptic seizures, mechanical incidents, strikes). Several functions play key roles in time-to-event analysis: the survival, hazard and cumulative hazard functions. We are still working under a population approach here so these functions, detailed below, are thus individual functions, i.e., each subject has its own. As we are using parametric models, this means that these functions depend on individual parameters $(\psi_i)$.

• The survival function $S(t, \psi_i)$ gives the probability that the event happens to individual i after time $t>t_{\text{start}}$:

$$S(t,\psi_i) = \mathbb{P}(T_i>t; \psi_i)$$

• The hazard function $h(t,psi_i)$ is defined for individual i as the instantaneous rate of the event at time t, given that the event has not already occurred:

$$h(t, \psi_i) = \lim_{dt \to 0} \frac{S(t, \psi_i) – S(t + dt, \psi_i)}{ S(t, \psi_i) dt}$$

This is equivalent to

$$h(t, \psi_i) = -\frac{d}{dt} \left(\log{S(t, \psi_i)}\right)$$

• Another useful quantity is the cumulative hazard function $H(a,b; \psi_i)$, defined for individual i as

$$H(a,b; \psi_i) = \int_a^b h(t,\psi_i) dt$$

Note that $S(t, \psi_i) = e^{-H(t_{\text{start}},t; \psi_i)}$. Then, the hazard function $h(t,\psi_i)$ characterizes the problem, because knowing it is the same as knowing the survival function $S(t, \psi_i)$. The probability distribution of survival data is therefore completely defined by the hazard function.

Time-to-event (TTE) models are thus defined in Monolix via the hazard function. Monolix also holds a TTE library that contains typical hazard functions for time-to-event data. More details and modeling guidelines can be found on the TTE dedicated webpage, along with case studies.

## Formatting of time-to-event data in the MonolixSuite

In the data set, exactly observed events, interval censored events and right censoring are recorded for each individual. Contrary to other softwares for survival analysis, the MonolixSuite requires to specify the time at which the observation period starts. This allows to define the data set using absolute times, in addition to durations (if the start time is zero, the records represent durations between the start time and the event).

The column TIME also contains the end of the observation period or the time intervals for interval-censoring. The column OBSERVATION contains an integer that indicates how to interpret the associated time. The different values for each type of event and observation are summarized in the table below:

The figure below summarizes the different situations with examples:

For instance for single events, exactly observed (with or without right censoring), one must indicate the start time of the observation period (Y=0), and the time of event (Y=1) or the time of the end of the observation period if no event has occurred (Y=0). In the following example:

ID TIME Y
1   0   0
1  34   1
2   0   0
2  80   0

the observation period lasts from starting time t=0 to the final time t=80. For individual 1, the event is observed at t=34, and for individual 2, no event is observed during the period. Thus it is noticed that at the final time (t=80), no event had occurred. Using absolute times instead of duration, we could equivalently write:

ID TIME Y
1  20   0
1  54   1
2  33   0
2  113  0

The duration between start time and event (or end of the observation period) are the same as before, but this time we record the day at which the patients enter the study and the days at which they have events or leave the study. Different patients may enter the study at different times.

Examples for repeated events, and interval censored events are available on the data set documentation page.

## Single event

To begin with, we will consider a one-off event. Depending on the application, the length of time to this event may be called the survival time (until death, for instance), failure time (until hardware fails), and so on. In general, we simply say “time-to-event”. The random variable representing the time-to-event for subject i is typically written Ti.

### Single event exactly observed or right censored

• tte1_project (data = tte1_data.txt , model=lib:exponential_model_singleEvent.txt)

The event time may be exactly observed at time $t_i$, but if we assume that the trial ends at time $t_{\text{stop}}$, the event may happen after the end. This is “right censoring”. Here, Y=0 at time t means that the event happened after t and Y=1 means that the event happened at time t. The rows with t=0 are included to show the trial start time $t_{\text{start}}=0$:
By clicking on the button Observed data, it is possible to display the Kaplan Meier plot (i.e. the empirical survival function) before fitting any model:

A very basic model with constant hazard is used for this data:

[LONGITUDINAL]
input = Te

EQUATION:
h = 1/Te

DEFINITION:
Event = {type=event, maxEventNumber=1, hazard=h}

OUTPUT:
output = {Event}


Here, Te is the expected time to event. Specification of the maximum number of events is required both for the estimation procedure and for the diagnosis plots based on simulation, such as the predicted interval for the Kaplan Meier plot which is obtained by Monte Carlo simulation:

### Single event interval censored or right censored

• tte2_project (data = tte2_data.txt , model=exponentialIntervalCensored_model.txt)

We may know the event has happened in an interval $I_i$ but not know the exact time $t_i$. This is interval censoring. Here, Y=0 at time t means that the event happened after t and Y=1 means that the event happened before time t.
Event for individual 1 happened between t=10 and t=15. No event was observed until the end of the experiment (t=100) for individual 5. We use the same basic model, but we now need to specify that the events are interval censored:

[LONGITUDINAL]
input = Te

EQUATION:
h = 1/Te

DEFINITION:
Event = {type=event, maxEventNumber=1, eventType=intervalCensored, hazard = h
intervalLength=5     ; used for the plots (not mandatory)
}

OUTPUT:
output = Event


## Repeated events

Sometimes, an event can potentially happen again and again, e.g., epileptic seizures, heart attacks. For any given hazard function h, the survival function S for individual i now represents the survival since the previous event at $t_{i,j-1}$, given here in terms of the cumulative hazard from $t_{i,j-1}$ to $t_{i,j}$:

$$S(t_{i,j} | t_{i,j-1}; \psi_i) = \mathbb{P}(T_{i,j} > t_{i,j} | T_{i,j-1} = t_{i,j-1}; \psi_i) = \exp(-\int_{t_{i,j-1}}^{t_{i,j}}h(t,\psi_i) dt)$$

### Repeated events exactly observed or right censored

• tte3_project (data = tte3_data.txt , model=lib:exponential_model_repeatedEvents.txt)

A sequence of $n_i$ event times is precisely observed before $t_{\text{stop}} = 200$: We can then display the Kaplan Meier plot for the first event and the mean number of events per individual:

After fitting the model, prediction intervals for these two curves can also be displayed on the same graph as on the following

### Repeated events interval censored or right censored

• tte4_project (data = tte4_data.txt , model=exponentialIntervalCensored_repeated_model.txt)

We do not know the exact event times, but the number of events that occurred for each individual in each interval of time.

## User defined likelihood function for time-to-event data

• weibullRTTE (data = weibull_data.txt , model=weibullRTTE_model.txt)

A Weibull model is used in this example:

[LONGITUDINAL]
input = {lambda, beta}

EQUATION:
h = (beta/lambda)*(t/lambda)^(beta-1)

DEFINITION:
Event = {type=event, hazard=h, eventType=intervalCensored,
intervalLength=5}

OUTPUT:
output = Event

• weibullCount (data = weibull_data.txt , model=weibullCount_model.txt)

Instead of defining the data as events, it is possible to consider the data as count data: indeed, we count the number of events per interval. An additional column with the start of the interval is added in the data file and defined as a regression variable. We then use a model for count data (see rtteWeibullCount_model.txt).