Which of the following techniques can be used to enhance the stability of Policy Gradient Methods?
Trust region policy optimization (TRPO).
Deep neural networks
Q-learning

Machine Learning Applications Exercises are loading ...