PPO2 開發應用場景心得2 - 現成乾貨最厲害

https://openreview.net/pdf?id=r1etN1rtPB

Done:

Value Clipping ( adv surrogate method)

State normalization + clipping

Reward scaling + clipping

Adv normalization ( Existed originally)

Other:

Remove discount reward.

Waited to be done:

Adam learning rate annealing

Orthogonal initialization and layer scaling

gradient clipping

Tanh Activation

ML / AI 簡易開發紀錄

搜尋此網誌

PPO2 開發應用場景心得2 - 現成乾貨最厲害

留言

張貼留言