MTSS_Cascade#
Overview#
Advantage: It is both scalable and robust. Furthermore, it also accounts for the iter-item heterogeneity.
Disadvantage:
Application Situation: Useful when presenting a ranked list of items, with only one selected at each interaction. The outcome is binary. Static feature information.
Main Idea#
MTSS_Cascade is an example of the general Thompson Sampling(TS)-based framework, MTSS [1], to deal with online learning to rank problems.
Review of MTSS: MTSS[1] is a meta-learning framework designed for large-scale structured bandit problems [2]. Mainly, it is a TS-based algorithm that learns the information-sharing structure while minimizing the cumulative regrets. Adapting the TS framework to a problem-specific Bayesian hierarchical model, MTSS simultaneously enables information sharing among items via their features and models the inter-item heterogeneity. Specifically, it assumes that the item-specific parameter
where
Review of MTSS_Cascade: To characterize the relationship between items using their features, one example choice of
The prior
Algorithm Details#
At each round
Key Steps#
For round
Approximate
by Pymc3;Sample
;Update
;Sample
;Take the action
w.r.t such that ;Receive reward
.
*Notations can be found in either the inroduction of the chapter “Structured Bandits” or the introduction of the cascading Bandit problems.
Demo Code#
Import the learner.#
import numpy as np
from causaldm.learners.CPL4.Structured_Bandits.Cascade import MTSS_Cascade
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Generate the Environment#
Here, we imitate an environment based on the Yelp dataset. The number of items recommended at each round,
from causaldm.learners.CPL4.Structured_Bandits.Cascade import _env_realCascade as _env
env = _env.Cascading_env(K = 3, seed = 0)
Specify Hyperparameters#
K: number of itmes to be recommended at each round
L: total number of candidate items
phi_beta: precision of the Beta distribution (i.e.,
)Xs: feature informations
(Note: if an intercept is considered, the should include a column of ones)gamma_prior_mean: the mean of the prior distribution of
gamma_prior_cov: the coveraince matrix of the prior distribution of
n_init: determine the number of samples that pymc3 will draw when updating the posterior
update_freq: frequency to update the posterior distribution of
(i.e., update every update_freq steps)seed: random seed
phi_beta = 1/4
K = env.K
S = env.Phi
gamma_prior_mean = np.zeros(env.p)
gamma_prior_cov = np.identity(env.p)
update_freq = 10
n_init = 1000
seed = 0
MTSS_agent = MTSS_Cascade.MTSS_Cascade(phi_beta = phi_beta, K = K, Xs = S,
gamma_prior_mean = gamma_prior_mean, gamma_prior_cov = gamma_prior_cov,
update_freq = update_freq, n_init = n_init, seed = seed)
Recommendation and Interaction#
Starting from t = 0, for each step t, there are four steps:
Recommend an action (a set of ordered restaturants)
A = MTSS_agent.take_action(S)
Get the reward from the environment (i.e.,
, , and )W,E,R = env.get_reward(A)
Update the posterior distribution
MTSS_agent.receive_reward(A,W,E,t,S)
t = 0
A = MTSS_agent.take_action(S)
W, E, R = env.get_reward(A)
MTSS_agent.receive_reward(A, W, E, t, S)
A, W, E, R
(array([2189, 1610, 1206], dtype=int64),
array([0., 0., 0.]),
array([1., 1., 1.]),
0.0)
Interpretation: For step 0, the agent decides to display three top restaurants, the first of which is restaurant 2189, the second is restaurant 1610, and the third is restaurant 1206. Unfortunately, the customer does not show any interest in any of the recommended restaurants. As a result, the agent receives a zero reward at round
References#
[1] Wan, R., Ge, L., & Song, R. (2022). Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework. arXiv preprint arXiv:2202.13227.
[2] Forcina, A. and Franconi, L. Regression analysis with the beta-binomial distribution. Rivista di Statistica Applicata, 21(1), 1988.
[3] Salvatier J., Wiecki T.V., Fonnesbeck C. (2016) Probabilistic programming in Python using PyMC3. PeerJ Computer Science 2:e55 DOI: 10.7717/peerj-cs.55.