6. Lp-R-learner#

As an extension of R-learner, Lp-R-learner combined the idea of residual regression with local polynomial adaptation, and leveraged the idea of cross fitting to further relax the conditions needed to obtain the oracle convergence rate. For brevity of content, we will just introduce their main algorithm. For more details about its theory and real data performance please see the paper written by Kennedy [4].

Let \((I_{1a}^n, I_{1b}^n,I_{2}^n)\) denote three independent samples of \(n\) observations of \(Z_i = (S_i, A_i, R_i)\). Let \(b:\mathbb{R}^d\rightarrow \mathbb{R}^p\) denote the vector of basis functions consisting of all powers of each covariate, up to order \(\gamma\), and all interactions up to degree \(\gamma\) polynomials. Let \(K_{hs}(S)=\frac{1}{h^d}K\left(\frac{S-s}{h}\right)\) for \(k:\mathbb{R}^d\rightarrow \mathbb{R}\) a bounded kernel function with support \([-1,1]^d\), and \(h\) is a bandwidth parameter.

Step 1: Nuisance training:

(a) Using \(I_{1a}^n\) to construct estimates \(\hat{\pi}_a\) of the propensity scores \(\pi\);

(b) Using \(I_{1b}^n\) to construct estimates \(\hat{\eta}\) of the regression function \(\eta=\pi\mu_1+(1-\pi)\mu_0\), and estimtes \(\hat{\pi}_b\) of the propensity scores \(\pi\).

Step 2: Localized double-residual regression:

Define \(\hat{\tau}_r(s)\) as the fitted value from a kernel-weighted least squares regression (in the test sample \(I_2^n\)) of outcome residual \((R-\hat{\eta})\) on basis terms \(b\) scaled by the treatment residual \(A-\hat{\pi}_b\), with weights \(\Big(\frac{A-\hat{\pi}_a}{A-\hat{\pi}_b}\Big)\cdot K_{hs}\). Thus \(\hat{\tau}_r(s)=b(0)^T\hat{\theta}\) for

(33)#\[\begin{equation} \hat{\theta}=\arg\min_{\theta\in\mathbb{R}^p}\mathbb{P}_n\left(K_{hs}(S)\Big\{ \frac{A-\hat{\pi}_a(S)}{A-\hat{\pi}_b(S)}\Big\} \left[ \big\{R-\hat{\eta}(S)\big\}-\theta^Tb(S-s_0)\big\{A-\hat{\pi}_b(S)\big\} \right] \right). \end{equation}\]

Step 3: Cross-fitting(optional):

Repeat Step 1–2 twice, first using \((I^n_{1b} , I_2^n)\) for nuisance training and \(I_{1a}^n\) as the test samplem and then using \((I^n_{1a} , I_2^n)\) for training and \(I_{1b}^n\) as the test sample. Use the average of the resulting three estimators of \(\tau\) as the final estimator \(\hat{\tau}_r\).

In the theory section, Kennedy proved that Lp-R-learner, compared with traditional DR learner, can achieve the oracle convergence rate under milder conditions.

# import related packages
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt;
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression 
from causaldm.learners.CEL.Single_Stage import _env_getdata_CEL
from causaldm.learners.CEL.Single_Stage.LpRlearner import LpRlearner
import warnings
warnings.filterwarnings('ignore')

MovieLens Data#

# Get the MovieLens data

MovieLens_CEL = _env_getdata_CEL.get_movielens_CEL()
MovieLens_CEL.pop(MovieLens_CEL.columns[0])
MovieLens_CEL = MovieLens_CEL[MovieLens_CEL.columns.drop(['Comedy','Action', 'Thriller'])]
MovieLens_CEL
user_id movie_id rating age Drama Sci-Fi gender_M occupation_academic/educator occupation_college/grad student occupation_executive/managerial occupation_other occupation_technician/engineer
0 48.0 1193.0 4.0 25.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
1 48.0 919.0 4.0 25.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
2 48.0 527.0 5.0 25.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
3 48.0 1721.0 4.0 25.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
4 48.0 150.0 4.0 25.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
65637 5878.0 3300.0 2.0 25.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
65638 5878.0 1391.0 1.0 25.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
65639 5878.0 185.0 4.0 25.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
65640 5878.0 2232.0 1.0 25.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
65641 5878.0 426.0 3.0 25.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0

65642 rows × 12 columns

n = len(MovieLens_CEL)
import random
np.random.seed(1)

outcome = 'rating'
treatment = 'Drama'
controls = ['age', 'gender_M', 'occupation_academic/educator',
       'occupation_college/grad student', 'occupation_executive/managerial',
       'occupation_other', 'occupation_technician/engineer']
n_folds = 5
y_model = GradientBoostingRegressor(max_depth=3)
ps_model_a = LogisticRegression()
ps_model_b = LogisticRegression()
s = 1
LpRlearner_model = LinearRegression()

sample_index = random.sample(np.arange(len(MovieLens_CEL)).tolist(),1000)
MovieLens_CEL = MovieLens_CEL.iloc[sample_index,:]

HTE_Lp_R_learner = LpRlearner(MovieLens_CEL, outcome, treatment, controls, y_model, ps_model_a, ps_model_b, s, LpRlearner_model, degree = 1)
estimate with Lp-R-learner

Let’s focus on the estimated HTEs for three randomly chosen users:

print("Lp-R-learner:  ",HTE_Lp_R_learner[np.array([0,300,900])])
Lp-R-learner:   [0.38914445 0.34495557 0.21900331]
ATE_Lp_R_learner = np.sum(HTE_Lp_R_learner)/1000
print("Choosing Drama instead of Sci-Fi is expected to improve the rating of all users by",round(ATE_Lp_R_learner,4), "out of 5 points.")
Choosing Drama instead of Sci-Fi is expected to improve the rating of all users by 0.2875 out of 5 points.

Conclusion: Choosing Drama instead of Sci-Fi is expected to improve the rating of all users by 0.3796 out of 5 points.

References#

  1. Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021.

  2. Peter M Robinson. Root-n-consistent semiparametric regression. Econometrica: Journal of the Econometric Society, pages 931–954, 1988.

  3. Edward H Kennedy. Optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497, 2020

  4. M. J. van der Laan. Statistical inference for variable importance. The International Journal of Biostatistics, 2(1), 2006.

  5. S. Lee, R. Okui, and Y.-J. Whang. Doubly robust uniform confidence band for the conditional average treatment effect function. Journal of Applied Econometrics, 32(7):1207–1225, 2017.

  6. D. J. Foster and V. Syrgkanis. Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019.