PPO2 開發應用場景心得2 - 現成乾貨最厲害



https://openreview.net/pdf?id=r1etN1rtPB

Done:

Value Clipping ( adv surrogate method)

State normalization + clipping

Reward scaling + clipping

Adv normalization ( Existed originally)


Other:

Remove discount reward.


Waited to be done:

Adam learning rate annealing

Orthogonal initialization and layer scaling

gradient clipping

Tanh Activation

留言