weixin_39550172
weixin_39550172
2020-12-09 13:26

Training PPO-algorithm

I executed the provided train.py script in convlab2/policy/ppo with the prespecified configurations. During training, the success-rate starts pretty high with around 25% and then bumps around 30-35% for some while. When training is finished, I used the evaluation.py script in convlab2/policy to evaluate the performance which gives me 26%, far from the 74% reported in the table.

My Question: What is the exact configuration that has been used for training the 74% model?

该提问来源于开源项目:thu-coai/ConvLab-2

  • 点赞
  • 回答
  • 收藏
  • 复制链接分享

11条回答

为你推荐

换一换