MIMIC III (3-Stages)#

In this notebook, we conducted analysis on the MIMIC III data with 3 stages. We first analyzed the mediation effect and then evaluate the policy of interest and calculated the optimal policy. As informed by the causal structure learning, here we consider Glucose and PaO2_FiO2 as confounders/states, IV_Input as the action, SOFA as the mediator.

import pandas as pd
import pickle
import numpy as np
import pandas as pd
file = open('mimic3_MDTR_data_dict_3stage_V2.pickle', 'rb')
mimic3_MDTR = pickle.load(file)
state, action, mediator, reward = mimic3_MDTR.values()
reward.iloc[np.where(reward['Died_within_48H']==-1)[0],-1]=0 # change the discrete action to binary
MDTR_data = pd.read_csv('mimic3_MDTR_3stage_V2.csv')
MDTR_data.iloc[np.where(MDTR_data['Died_within_48H']==-1)[0],-1]=0
MDTR_data.head()
Glucose_1 PaO2_FiO2_1 IV_Input_1 SOFA_1 Glucose_2 PaO2_FiO2_2 IV_Input_2 SOFA_2 Glucose_3 PaO2_FiO2_3 IV_Input_3 SOFA_3 Died_within_48H
0 116.833333 172.000000 1 9 125.666667 364.000000 1 5 132.200000 439.310339 1 0 1
1 120.000000 170.000000 1 5 154.000000 163.714286 1 7 164.000000 174.000000 1 6 1
2 123.200000 266.000000 1 8 126.400000 94.000000 1 7 129.600000 91.388889 1 8 0
3 168.500000 260.833333 0 3 227.000000 277.777778 1 4 257.857143 191.935482 1 10 0
4 115.000000 146.000000 1 7 115.000000 208.666667 1 8 115.000000 210.000000 1 10 1

CEL: 3-Stage Mediation Analysis#

Under the 3-stage setting, we are interested in analyzing the treatment effect on the final outcome Died_within_48H observed at the end of the study by comparing the target treatment regime that provides IV input at all three stages and the control treatment regime that does not provide any treatment. Using the Q-learning based estimator proposed in [1], we examine the natural direct and indirect effects of the target treatment regime based on observational data. With the code in the following blocks, the estimated effect components are summarized in the following, with the bootstrapped standard error included in the parenthesis:

NDE

NIE

TE

-.213(.695)

.156(.647)

-.057(.284)

from causaldm.learners.CEL.MA import Mediated_QLearning
MediatedQLearn = Mediated_QLearning.Mediated_QLearning()
N=len(state)
regime_control = pd.DataFrame({'IV_Input_1':[0]*N,'IV_Input_2':[0]*N, 'IV_Input_3':[0]*N}).set_index(state.index)
regime_target = pd.DataFrame({'IV_Input_1':[1]*N,'IV_Input_2':[1]*N, 'IV_Input_3':[1]*N}).set_index(state.index)
MediatedQLearn.train(state, action, mediator, reward, T=3, dim_state = 2, dim_mediator = 1, 
                     regime_target = regime_target, regime_control = regime_control,bootstrap = False)
MediatedQLearn.est_NDE_NIE()
MediatedQLearn.NIE, MediatedQLearn.NDE, MediatedQLearn.TE
(0    0.155758
 dtype: float64,
 0   -0.212838
 dtype: float64,
 0   -0.05708
 dtype: float64)
MediatedQLearn.train(state, action, mediator, reward, T=3, dim_state = 2, dim_mediator = 1, 
                     regime_target = regime_target, regime_control = regime_control,bootstrap = True, n_bs = 500)
MediatedQLearn._predict_value_boots()
MediatedQLearn.NIE_se, MediatedQLearn.NDE_se, MediatedQLearn.TE_se
(array([0.6474]), array([0.6946]), array([0.2835]))

CPL: 3-Stage Policy Evaluation#

from causaldm.learners.CPL13.disc import QLearning

As an example, we use the Q-learning algorithm to evaluate policies based on the observed data, with the linear regression models defined as the following:

(167)#\[\begin{align} Q_1(s,a_1,\boldsymbol{\beta}) = &\beta_{00}+\beta_{01}*\textrm{Glucose}_1+\beta_{02}*\textrm{PaO2_FiO2}_1\\ &I(a_1=1)*\{\beta_{10}+\beta_{11}*\textrm{Glucose}_1+\beta_{12}*\textrm{PaO2_FiO2}_1\},\\ Q_2(s,a_2,\boldsymbol{\mu}) = &\mu_{00}+\mu_{01}*\textrm{Glucose}_1+\mu_{02}*\textrm{PaO2_FiO2}_1+\mu_{03}*\textrm{SOFA}_1+\\ &\mu_{04}*\textrm{Glucose}_2+\mu_{05}*\textrm{PaO2_FiO2}_2+\\ &I(a_2=1)*\{\mu_{10}+\mu_{11}*\textrm{Glucose}_2+\mu_{12}*\textrm{PaO2_FiO2}_2+\mu_{13}*\textrm{SOFA}_1\},\\ Q_3(s,a_3,\boldsymbol{\theta}) = &\theta_{00}+\theta_{01}*\textrm{Glucose}_1+\theta_{02}*\textrm{PaO2_FiO2}_1+\theta_{03}*\textrm{SOFA}_1+\\ &\theta_{04}*\textrm{Glucose}_2+\theta_{05}*\textrm{PaO2_FiO2}_2+\theta_{06}*\textrm{SOFA}_2+\\ &\theta_{07}*\textrm{Glucose}_3+\theta_{08}*\textrm{PaO2_FiO2}_3+\\ &I(a_2=1)*\{\theta_{10}+\theta_{11}*\textrm{Glucose}_3+\theta_{12}*\textrm{PaO2_FiO2}_3\} \end{align}\]

Using the code below, we evaluated two target polices (regimes). The first one is a fixed treatement regime that applies no treatment at all stages (Policy1), with an estimated value of .8991. Another is a fixed treatment regime that applies treatment at all stages (Policy2), with an estimated value of .8246. Therefore, the treatment effect of Policy2 comparing to Policy1 is -.0745, implying that receiving IV input increase the mortality rate.

MDTR_data.rename(columns = {'Died_within_48H':'R',
                            'Glucose_1':'S1_1', 'Glucose_2':'S1_2','Glucose_3':'S1_3',
                            'PaO2_FiO2_1':'S3_1', 'PaO2_FiO2_2':'S3_2','PaO2_FiO2_3':'S3_3',
                            'SOFA_1':'S4_1', 'SOFA_2':'S4_2', 'SOFA_3':'S4_3',
                            'IV_Input_1':'A1','IV_Input_2':'A2', 'IV_Input_3':'A3'}, inplace = True)
R = MDTR_data['R'] #lower the better
S = MDTR_data[['S1_1','S1_2','S1_3','S3_1','S3_2','S3_3','S4_1','S4_2','S4_3']]
A = MDTR_data[['A1','A2', 'A3']]
# specify the model you would like to use
model_info = [{"model": "R~S1_1+S3_1+A1+S1_1*A1+S3_1*A1",
              'action_space':{'A1':[0,1]}},
             {"model": "R~S1_1+S3_1+S4_1+S1_2+S3_2+A2+S1_2*A2+S3_2*A2+S4_1*A2",
              'action_space':{'A2':[0,1]}},
             {"model": "R~S1_1+S3_1+S4_1+S1_2+S3_2+S4_2+S1_3+S3_3+A3+S1_3*A3+S3_3*A3+S4_2*A3",
              'action_space':{'A3':[0,1]}}]
# Evaluating the policy with no treatment
N=len(S)
regime = pd.DataFrame({'A1':[0]*N,
                      'A2':[0]*N,
                     'A3':[0]*N}).set_index(S.index)
#evaluate the regime
QLearn = QLearning.QLearning()
QLearn.train(S, A, R, model_info, T=3, regime = regime, evaluate = True, mimic3_clip = True)
QLearn.predict_value(S)
0.8990981941216631
# Evaluating the policy that gives IV input at both stages
N=len(S)
regime = pd.DataFrame({'A1':[1]*N,
                      'A2':[1]*N,
                     'A3':[1]*N}).set_index(S.index)
#evaluate the regime
QLearn = QLearning.QLearning()
QLearn.train(S, A, R, model_info, T=3, regime = regime, evaluate = True, mimic3_clip = True)
QLearn.predict_value(S)
0.8246053645689122

CPL: 3-Stage Policy Optimization#

Further, to find an optimal policy maximizing the expected value, we use the Q-learning algorithm again to do policy optimization. Using the regression model we specified above and the code in the following block, the estimated optimal policy is summarized as the following regime.

  • At stage 1:

    1. We would recommend \(A=0\) (IV_Input = 0) if \(.0001*\textrm{Glucose}_1+.0012*\textrm{PaO2_FiO2}_1>.0551\)

    2. Else, we would recommend \(A=1\) (IV_Input = 1).

  • At stage 2:

    1. We would recommend \(A=0\) (IV_Input = 0) if \(.0002*\textrm{Glucose}_2-.00001*\textrm{PaO2_FiO2}_2+.0070*\textrm{SOFA}_1<.0721\)

    2. Else, we would recommend \(A=1\) (IV_Input = 1).

  • At stage 3:

    1. We would recommend \(A=0\) (IV_Input = 0) if \(-.0005*\textrm{Glucose}_2+.0008*\textrm{PaO2_FiO2}_2-.0114*\textrm{SOFA}_2<.2068\)

    2. Else, we would recommend \(A=1\) (IV_Input = 1).

Appling the estimated optimal regime to individuals in the observed data, we summarize the regime pattern for each patients in the following table:

# patients

IV_Input 1

IV_Input 2

IV_Input 3

23

1

1

0

10

1

0

0

5

1

1

1

4

0

0

0

4

0

1

0

4

0

1

1

2

0

0

1

2

1

0

1

The estimated value of the estimated optimal policy is .9637.

# initialize the learner
QLearn = QLearning.QLearning()
# train the policy
QLearn.train(S, A, R, model_info, T=3, mimic3_clip = True)
# get the summary of the fitted Q models using the following code
#print("fitted model Q0:",QLearn.fitted_model[0].summary())
#print("fitted model Q1:",QLearn.fitted_model[1].summary())
#4. recommend action
opt_d = QLearn.recommend_action(S).value_counts()
#5. get the estimated value of the optimal regime
V_hat = QLearn.predict_value(S)
print("opt_d:",opt_d)
print("opt value:",V_hat)
opt_d: A3  A2  A1
0   1   1     23
    0   1     10
1   1   1      5
0   0   0      4
    1   0      4
1   1   0      4
    0   0      2
        1      2
dtype: int64
opt value: 0.9637185597953717

Reference#

[1] Zheng, W., & van der Laan, M. (2017). Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. Journal of causal inference, 5(2).