Categorical data modeling using Monolix

Introduction
Formatting of categorical data in the MonolixSuite
Ordered categorical data
Ordered categorical data with regression variables
Discrete-time Markov chain
Continuous-time Markov chain

Objectives: learn how to implement a model for categorical data, assuming either independence or a Markovian dependence between observations.

Projects: categorical1_project, categorical2_project, markov0_project, markov1a_project, markov1b_project, markov1c_project, markov2_project, markov3a_project, markov3b_project

Introduction

Assume now that the observed data takes its values in a fixed and finite set of nominal categories $\{c_1, c_2,\ldots , c_K\}$. Considering the observations $(y_{ij},\, 1 \leq j \leq n_i)$ for any individual $i$ as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions $\mathbb{P}(y_{ij}=c_k | \psi_i)$ for $k=1,\ldots, K$ and $1 \leq j \leq n_i$. For a given (i,j), the sum of the K probabilities is 1, so in fact only K-1 of them need to be defined. In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each k, $\mathbb{P}(y_{ij}=c_k | \psi_i) \in [0,1]$, and $\sum_{k=1}^{K} \mathbb{P}(y_{ij}=c_k | \psi_i) =1$. Ordinal data further assumed that the categories are ordered, i.e., there exists an order $\prec$ such that

$$c_1 \prec c_2,\prec \ldots \prec c_K $$

We can think, for instance, of levels of pain (low $\prec$ moderate $\prec$ severe) or scores on a discrete scale, e.g., from 1 to 10. Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $\mathbb{P}(y_{ij} \preceq c_k | \psi_i)$ for $k=1,\ldots ,K-1$, or in the other direction: $\mathbb{P}(y_{ij} \succeq c_k | \psi_i)$ for $k=2,\ldots, K$. Any model is possible as long as it defines a probability distribution, i.e., it satisfies

$$0 \leq \mathbb{P}(y_{ij} \preceq c_1 | \psi_i) \leq \mathbb{P}(y_{ij} \preceq c_2 | \psi_i)\leq \ldots \leq \mathbb{P}(y_{ij} \preceq c_K | \psi_i) =1 .$$

It is possible to introduce dependence between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a Markov chain. For instance, a Markov chain with memory 1 assumes that all that is required from the past to determine the distribution of $y_{ij}$ is the value of the previous observation $y_{i,j-1}$., i.e., for all $k=1,2,\ldots ,K$,

$$\mathbb{P}(y_{ij} = c_k\,|\,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i) = \mathbb{P}(y_{ij} = c_k | y_{i,j-1},\psi_i)$$

Formatting of categorical data in the MonolixSuite

In case of categorical data, the observations at each time point can only take values in a fixed and finite set of nominal categories. In the data set, the output categories must be coded as integers, as in the following example:

ID TIME Y
1 0.5 3
1 1 0
1 1.5 2
1 2 2
1 2.5 3

One can see the respiratory status data set and the warfarin data set for example for more practical examples on a categorical and a joint continuous and categorical data set respectively.

Ordered categorical data

categorical1_project (data = ‘categorical1_data.txt’, model = ‘categorical1_model.txt’)

In this example, observations are ordinal data that take their values in {0, 1, 2, 3}:

Cumulative odds ratio are used in this example to define the model

$$\textrm{logit}(\mathbb{P}(y_{ij} \leq k))= \log \left( \frac{\mathbb{P}(y_{ij} \leq k)}{1 – \mathbb{P}(y_{ij} \leq k )} \right)$$

where

$$\begin{array}{ccl} \text{logit}(\mathbb{P}(y_{ij} \leq 0)) &=& \theta_{i,1}\\ \text{logit}(\mathbb{P}(y_{ij} \leq 1)) &=& \theta_{i,1}+\theta_{i,2}\\ \text{logit}(\mathbb{P}(y_{ij} \leq 2)) &=& \theta_{i,1}+\theta_{i,2}+\theta_{i,3}\end{array}$$

This model is implemented in categorical1_model.txt:

[LONGITUDINAL]
input = {th1, th2, th3}

DEFINITION:
level = { type = categorical,  categories = {0, 1, 2, 3},
  logit(P(level<=0)) = th1
  logit(P(level<=1)) = th1 + th2
  logit(P(level<=2)) = th1 + th2 + th3
}

A normal distribution is used for $\theta_{1}$, while log-normal distributions for $\theta_{2}$ and $\theta_{3}$ ensure that these parameters are positive (even without variability). Residuals for noncontinuous data reduce to NPDE’s. We can compare the empirical distribution of the NPDE’s with the distribution of a standardized normal distribution:

VPC’s for categorical data compare the observed and predicted frequencies of each category over time:

The prediction distribution can also be computed by Monte-Carlo:

Ordered categorical data with regression variables

categorical2_project (data = ‘categorical2_data.txt’, model = ‘categorical2_model.txt’)

A proportional odds model is used in this example, where PERIOD and DOSE are used as regression variables (i.e. time-varying covariates)

Discrete-time Markov chain

If observation times are regularly spaced (constant length of time between successive observations), we can consider the observations $(y_{ij},j=1,2,\ldots,n_i)$ to be a discrete-time Markov chain.

markov0_project (data = ‘markov1a_data.txt’, model = ‘markov0_model.txt’)

In this project, states are assumed to be independent and identically distributed:

$ \mathbb{P}(y_{ij} = 1) = 1 – \mathbb{P}(y_{ij} = 2) = p_{i,1} $

Observations in markov1a_data.txt take their values in {1, 2}.

markov1a_project (data = ‘markov1a_data.txt’, model = ‘markov1a_model.txt’)

Here,

$\begin{aligned}\mathbb{P}(y_{i,j} = 1 | y_{i,j-1} = 1) = 1 – \mathbb{P}(y_{i,j} = 2 | y_{i,j-1} = 1) = p_{i,11}\\ \mathbb{P}(y_{i,j} = 1 | y_{i,j-1} = 2) = 1 – \mathbb{P}(y_{i,j} = 2 | y_{i,j-1} = 2) = p_{i,12} \end{aligned}$

[LONGITUDINAL]
input = {p11, p21}
DEFINITION:
State = {type = categorical,  categories = {1,2},  dependence = Markov
  P(State=1|State_p=1) = p11
  P(State=1|State_p=2) = p21
}

The distribution of the initial state is not defined in the model, which means that, by default,

$ \mathbb{P}(y_{i,1} = 1) = \mathbb{P}(y_{i,1} = 2) = 0.5 $

markov1b_project (data = ‘markov1b_data.txt’, model = ‘markov1b_model.txt’)

The distribution of the initial state, $p = \mathbb{P}(y_{i,1} = 1)$, is estimated in this example

DEFINITION:
State = {type = categorical,  categories = {1,2},  dependence = Markov
  P(State_1=1)= p
  P(State=1|State_p=1) = p11
  P(State=1|State_p=2) = p21
}

markov3a_project (data = ‘markov3a_data.txt’, model = ‘markov3a_model.txt’)

Transition probabilities change with time in this example. We then define time varying transition probabilities in the model:

[LONGITUDINAL]
input = {a1, b1, a2, b2}
EQUATION:
lp11 = a1 + b1*t/100
lp21 = a2 + b2*t/100
DEFINITION:
State = {type = categorical, categories = {1,2}, dependence = Markov
  logit(P(State=1|State_p=1)) = lp11
  logit(P(State=1|State_p=2)) = lp21
}

markov2_project (data = ‘markov2_data.txt’, model = ‘markov2_model.txt’)

Observations in markov2_data.txt take their values in {1, 2, 3}. Then, 6 transition probabilities need to be defined in the model.

Continuous-time Markov chain

The previous situation can be extended to the case where time intervals between observations are irregular by modeling the sequence of states as a continuous-time Markov process. The difference is that rather than transitioning to a new (possibly the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by transition rates instead of transition probabilities:

$ \mathbb{P}(y_{i}(t+h) = k,|,y_{i}(t)=\ell , \psi_i) = h \rho_{\ell k}(t,\psi_i) + o(h),\qquad k \neq \ell .$

The probability that no transition happens between $t$ and $t+h$ is

$ \mathbb{P}(y_{i}(s) = \ell, \forall s\in(t, t+h) | y_{i}(t)=\ell , \psi_i) = e^{h , \rho_{\ell \ell}(t,\psi_i)} .$

Furthermore, for any individual i and time t, the transition rates $(\rho_{\ell,k}(t, \psi_i))$ satisfy for any $1\leq \ell \leq K$,

$ \sum_{k=1}^K \rho_{\ell k}(t, \psi_i) = 0$

Constructing a model therefore means defining parametric functions of time $(\rho_{\ell,k})$ that satisfy this condition.

markov1c_project (data = ‘markov1c_data.txt’, model = ‘markov1c_model.txt’)

Observation times are irregular in this example. Then, a continuous time Markov chain should be used in order to take into account the Markovian dependence of the data:

DEFINITION:
State = { type = categorical,  categories = {1,2}, dependence = Markov
  transitionRate(1,2) = q12
  transitionRate(2,1) = q21
}

markov3b_project (data = ‘markov3b_data.txt’, model = ‘markov3b_model.txt’)

Time varying transition rates are used in this example.