https://openreview.net/pdf?id=r1etN1rtPB
Done:
Value Clipping ( adv surrogate method)
State normalization + clipping
Reward scaling + clipping
Adv normalization ( Existed originally)
Other:
Remove discount reward.
Waited to be done:
Adam learning rate annealing
Orthogonal initialization and layer scaling
gradient clipping
Tanh Activation
留言
張貼留言