Discrete Action Space#

In this section, our focus is on single-stage policy learning that involves actions which are either binary (0 or 1) or multinomial (choices A, B, C, or D).