MIMIC III (Infinite Horizon)#
In this notebook, we conducted analysis on the MIMIC III data with infinite horizon. We first analyzed the mediation effect and then evaluate the policy of interest and calculated the optimal policy. As informed by the causal structure learning, here we consider Glucose and PaO2_FiO2 as confounders/states, IV_Input as the action, SOFA as the mediator.
import pandas as pd
import numpy as np
import pandas as pd
import pickle
file = open('mimic3_MRL_data_dict_V2.pickle', 'rb')
mimic3_MRL = pickle.load(file)
mimic3_MRL['reward'] = [1 if r == 0 else r for r in mimic3_MRL['reward']]
mimic3_MRL['reward'] = [0 if r == -1 else r for r in mimic3_MRL['reward']]
MRL_df = pd.read_csv('mimic3_MRL_df_V2.csv')
MRL_df.iloc[np.where(MRL_df['Died_within_48H']==0)[0],-1]=1
MRL_df.iloc[np.where(MRL_df['Died_within_48H']==-1)[0],-1]=0
MRL_df[MRL_df.icustayid==1006]
icustayid | bloc | Glucose | PaO2_FiO2 | IV_Input | SOFA | next_Glucose | next_PaO2_FiO2 | Died_within_48H | |
---|---|---|---|---|---|---|---|---|---|
0 | 1006 | 1 | 91.0 | 206.000000 | 0 | 8 | 91.0 | 206.000000 | 1 |
1 | 1006 | 3 | 91.0 | 206.000000 | 0 | 8 | 175.0 | 100.173913 | 1 |
2 | 1006 | 6 | 175.0 | 100.173913 | 1 | 3 | 175.0 | 96.000000 | 1 |
3 | 1006 | 7 | 175.0 | 96.000000 | 1 | 10 | 175.0 | 96.000000 | 1 |
4 | 1006 | 8 | 175.0 | 96.000000 | 1 | 9 | 144.0 | 187.234036 | 0 |
CEL: Mediation Analysis with Infinite Horizon#
We processed the MIMIC III data similarly to literature on reinforcement learning by setting the reward of each stage prior to the final stage to 0, and the reward of the final stage to the observed value of Died within 48H. In this section, we analyze the average treatment effect (ATE) of a target policy that provides IV input all of the time compared to a control policy that provides no IV input at all. Using the multiply-robust estimator proposed in [1], we decomposed the ATE into four components, including immediate nature dierct effect (INDE), Immediate nature mediator effect (INME), delayed nature direct effect (DNDE), and delayed nature mediator effect (DNME), and estimated each of the effect component. The estimation results are summarized in the table below.
INDE |
INME |
DNDE |
DNME |
ATE |
---|---|---|---|---|
-.0261(.0088) |
.0042(.0036) |
.0024(.0023) |
.0007(.0012) |
-.0188(.0069) |
Specifically, the ATE of the target policy is significantly negative, with an effect size of .0184. Diving deep, we find that the DNME and DNDE are insignificant, whereas the INDE and INME are all statistically significant. Further, taking the effect size into account, we can conclude that the majority of the average treatment effect is directly due to the actions derived from the target treatment policy, while the part of the effect that can be attributed to the mediators is negligible.
from causaldm.learners.CEL.MA import ME_MDP
# Control Policy
def control_policy(state = None, dim_state=None, action=None, get_a = False):
# fixed policy with fixed action 0
if get_a:
action_value = np.array([0])
else:
state = np.copy(state).reshape(-1,dim_state)
NT = state.shape[0]
if action is None:
action_value = np.array([0]*NT)
else:
action = np.copy(action).flatten()
if len(action) == 1 and NT>1:
action = action * np.ones(NT)
action_value = 1-action
return action_value
def target_policy(state, dim_state = 1, action=None):
state = np.copy(state).reshape((-1, dim_state))
NT = state.shape[0]
pa = 1 * np.ones(NT)
if action is None:
if NT == 1:
pa = pa[0]
prob_arr = np.array([1-pa, pa])
action_value = np.random.choice([0, 1], 1, p=prob_arr)
else:
raise ValueError('No random for matrix input')
else:
action = np.copy(action).flatten()
action_value = pa * action + (1-pa) * (1-action)
return action_value
#Fixed hyper-parameter--no need to modify
MCMC = 50
truncate = 50
problearner_parameters = {"splitter":["best","random"], "max_depth" : range(1,50)},
dim_state=2; dim_mediator = 1
ratio_ndim = 10
d = 2
L = 5
scaler = 'Identity'
method = "Robust"
seed = 0
r_model = "OLS"
Q_settings = {'scaler': 'Identity','product_tensor': False, 'beta': 3/7,
'include_intercept': False,
'penalty': 10**(-4),'d': d, 'min_L': L, 't_dependent_Q': False}
Robust_est = ME_MDP.evaluator(mimic3_MRL, r_model = r_model,
problearner_parameters = problearner_parameters,
ratio_ndim = ratio_ndim, truncate = truncate, l2penalty = 10**(-4),
target_policy=target_policy, control_policy = control_policy,
dim_state = dim_state, dim_mediator = dim_mediator,
Q_settings = Q_settings,
MCMC = MCMC,
seed = seed, nature_decomp = True, method = method)
Robust_est.estimate_DE_ME()
Robust_est.est_IDE, Robust_est.IME, Robust_est.DDE, Robust_est.DME, Robust_est.TE
Building 0-th basis spline (total 3 state-mediator dimemsion) which has 2 basis, in total 2 features
Building 1-th basis spline (total 3 state-mediator dimemsion) which has 2 basis, in total 4 features
Building 2-th basis spline (total 3 state-mediator dimemsion) which has 2 basis, in total 6 features
(-0.026068280875851824,
0.00420277287581835,
0.0024229424340379844,
0.0006599800396108243,
-0.018782585526384673)
Robust_est.IDE_se, Robust_est.IME_se, Robust_est.DDE_se, Robust_est.DME_se, Robust_est.TE_se
(0.008772183809351398,
0.0035581671878296196,
0.002258533318055646,
0.0011830437572723908,
0.006888698088228283)
Reference#
[1] Ge, L., Wang, J., Shi, C., Wu, Z., & Song, R. (2023). A Reinforcement Learning Framework for Dynamic Mediation Analysis. arXiv preprint arXiv:2301.13348.