# Observed data

Three types of data can be visualized in Monolix using the graphical interface:

• Continuous data

The purpose of this plot, also called a spaghetti plot, is to display the original data w.r.t. time.

In the example below, the concentration of warfarin from the warfarin data set is displayed. A subject is highlighted in yellow by hovering on the line. One can plot the output in a log-scale to have a better evaluation of the elimination part for example as in the figure below. An interesting feature is the possibility to display the dosing time as on the figure below. In the proposed example (PKVK_project of the demos), the individual dosing time of the individual is displayed when the user hovers an individual. Information are also provided. We propose

• The total number of subjects
• The average number of doses per subject
• The total, average, minimum and maximum number of observations per individual. In addition, if we split the graphic with a covariate, all the information are recomputed to manage the information of the group as in the following plot. • Discrete data

Among discrete data available in Monolix, we distinguish count or categorical data. This example shows the evolution of scores, which are categories describing anxious disorders, from the zylkene-data-set. Data can be displayed in a stacked form or grouped • Time-to-event data

### Definition

Survival function, which describes the probability that an event happens after some time t, is a typical way to display time-to-even data. In general, this function is unknown and Monolix uses the non-parametric Kaplan-Meier estimator. It describes the probability that an individual survives until time t, knowing that it survived at any earlier time. For single events, it is given by the following formula
$$\hat{S}(t)=\sum_{i:t_i<t} \left(1-\frac{d_i}{n_i}\right),$$
where

• $$t_i$$ – times before t, when at least one event occurred,
• $$d_i$$ – number of events at the time $$t_i$$,
• $$n_i$$ – number of individuals at risk, that is who did not experience an event until $$t_i$$.

The probability that an event occurs $$(p_e)$$ is the ratio between the number of events that has occurred $$(d_i)$$ and the total number of individuals at risk $$(n_i)$$. The complement of it, $$(1-p_e)$$, gives an estimation of the survival. For each time t the total number of individuals at risk changes, so the probabilities at all previous times $$t_i$$, when at least one event occurred, are multiplied. It is similar to calculating the probability that a patient survives 2 days. It is a product of a probability that a patient survives the first day and a conditional probability that it survives the second day, knowing that it survived the first one.

### Example:

A typical example of a time-to-event data set contains information about exact times when individuals experienced an event or when they left a study (drop-out). In the following, there are five individuals, who have two observations: time when the observation starts, which is 0 for all, and time of an event. If a patient leaves a study, then the time of a drop-out is given but instead of 1 in the column for the observation, there is 0. It indicates that this individual didn’t experience an event but survived until the drop-out time.The advantage of the Kaplan-Meier estimate is that it takes into account situations when not all individuals continue the study. At the next event time, such individuals are not counted as individuals at risk (they are not counted in the denominator $$n_i$$).

## A study starts at time $$t_1$$. There are no events, so $$d_1=0$$ and the value of the survival curve is 1. Until the next event time at $$t_2=1$$, the survival remains constant. Then, one individual experienced an event, so $$d_2=1$$, and all individuals survived until that time, so $$n_2=5$$. The result is that the probability to survive decreases by 0.2, which corresponds to the height of the jump at $$t=1$$ in the plot. Then again, until the next event, survival remains constant. At time $$t=3$$, there are two events. The number $$n_3$$ counts now only 4 individuals – it has decreased by 1 due to the previous event. To get the final value probability at time 3is multiplied by all earlier probabilities. At $$t=4$$ there is a drop-out. Patient 5 left the study and no event was registered. The survival curve remains constant, and the drop-out is marked in red. The Kaplan-Meier estimator takes into account this situations, because at the next event time $$t=5$$, this individual is not counted as an individual at risk – denominator n will be smaller. At time $$t=5$$, there is only one individual left, and one event, so the survival equals 0.

### Remarks

• Kaplan-Meier estimator handles correctly information about individuals who left the study, but there is a bias when the exact times of events are unknown.
• In data visualization, Monolix assumes that all events are exactly observed. For example: assume that an observation period started at $$t=0$$ and at $$t=1$$ an event is marked by 1 in the column for the observation. It is impossible to distinguish in the dataset, without any other information, if the event was exactly at $$t=1$$ or before. The same problem is when a time of the beginning of the study and time interval limits of an event are given. Just looking at the data set, an exact and interval censored event type are indistinguishable. In other words, not knowing when an event happened, Monolix assumes that it happened at the end of the censored interval.

### Mean number of events.

The Kaplan-Meier estimator can be used also for the analysis of repeated events. The survival curve is estimated for each k-th event separately

$$\hat{S}^{(k)}(t)=\sum_{i:t_i<t} \left(1-\frac{d^{(k)}_i}{n^{(k)}_i}\right),$$

and is used to calculate the mean number of events per individual as a function of time

$$\hat{m}(t)=\sum_{k} \left(1-\hat{S}^{(k)}(t)\right).$$

It can be visualized in Datxplore and Monolix next to the Survival function by choosing this option from the Subplots settings:

# Settings

• General: Add/remove the legend or the grid,
• Axes: Add/remove log-scale, modify labels,
• Stratify: Split, color and filter by covariates,
• Preferences: Add/remove elements or change colors and sizes for axes, observations, censored (BLQ) observations, highlighting.

# Best practices

• It is always good to have a look first at the spaghetti plot before running the parameter estimation. Indeed, it is very convenient to see if all the data is consistent, or if some outliers appear. Moreover, looking at the plot can help to identify hypotheses about the model, such as covariate effects.
• It is possible to generate the Spaghetti plot just after loading the data. For that, click on “Show dataviewer” next to the data file choice.
• For a better understanding and/or exploration of the data set, it is also possible to export the data set in Datxplore.