weixin_39745933
weixin_39745933
2020-12-01 12:25

A2C vs PPO Advantage normalisation

Hello,

Looking at your implementation, I was wondering if there was any reason why the advantage is normalized in ppo, where as it is not done in a2c. ppo: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/blob/master/algo/ppo.py#L34 a2c: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/blob/master/algo/a2c_acktr.py#L47

Surprisingly, the same choice was made in OpenAI Baselines: ppo: https://github.com/hill-a/stable-baselines/blob/master/baselines/ppo2/ppo2.py#L98 a2c: https://github.com/hill-a/stable-baselines/blob/master/baselines/a2c/a2c.py#L65

(Also, in OpenAI Baselines, for ppo2, they additionally clip the value function, even if it is not mentioned in the paper)

该提问来源于开源项目:ikostrikov/pytorch-a2c-ppo-acktr-gail

  • 点赞
  • 回答
  • 收藏
  • 复制链接分享

5条回答