Tensorflow2之土炮平行RL訓練大法

流程：

Player Agent開平行 initialize GPU
餵前一代模型權重到Player Agent平行玩遊戲蒐集Training Data
集中回Local端Trainer Agent Training
餵模型權重到Player Agent開平行玩遊戲測TrainDev Score
關掉平行kernels (用不同core數開parallel即可)
Loop......

平行方法：

               def work(act_weights, crit_weights):
                '''Create Actor/Critic, Copy input weights to A/C, play env and gather data.'''
                  return RewardData

[episodes*RewardData] = parallel(n_core)delayed(work)((**kwarg) for i in range(episodes))

關鍵：

Playing 跟 Run 的kernel數要一模一樣，每代episodes / test rounds也必須整除
每次Loop完後須手動殺掉並重開平行kernel
先開平行腳本initialize GPU, 再開一個平行腳本餵權重蒐集Reward
所有work()裡面，@TF.Function()後的函數都要用.py檔另外call（包括Actor/Critic）

平行GPU initialization Problem

餵權重時由於權重本身是tensorflow的一個object，推測會把本機GPU資訊直接傳到子kernel，導致initialization error（Physical devices cannot be modified after being initialized）

解法就是先跑init func把平行開起來後 initialize GPU

tf.config.experimental.set_memory_growth(physical_devices[0], True)

再跑work function開平行把權重餵進去蒐集Reward

AutoGraph: could not get source code Warning

由於work()會在平行kenel內執行，因此AutoGraph會沒辦法access到目前運行的Jupyter Notebook裡拿@tf.funcion()包起來函數之原始碼，解決方式就是把這些會用到的函數包含Actor/Critic都存成.py檔，讓平行kernel直接讀。

但不跑平行的函數不用另外存。

ML / AI 簡易開發紀錄

搜尋此網誌

Tensorflow2之土炮平行RL訓練大法

留言

張貼留言