Which of the following is NOT a type of policy in reinforcement learning?
Stochastic policy
Value-based policy
Overlook minor misbehaviors
Impose harsh punishments for any infraction

Reinforcement Learning Exercises are loading ...