weixin_39550172
2020-12-09 13:26Training PPO-algorithm
I executed the provided train.py script in convlab2/policy/ppo with the prespecified configurations. During training, the success-rate starts pretty high with around 25% and then bumps around 30-35% for some while. When training is finished, I used the evaluation.py script in convlab2/policy to evaluate the performance which gives me 26%, far from the 74% reported in the table.
My Question: What is the exact configuration that has been used for training the 74% model?
该提问来源于开源项目:thu-coai/ConvLab-2
- 点赞
- 回答
- 收藏
- 复制链接分享
11条回答
为你推荐
- 通过间隔有效索引对象的结构
- algorithm
- data-structures
- 1个回答
- 为什么这种二叉树搜索比插入要花这么长时间?
- algorithm
- time-complexity
- 1个回答
- AWS预签名URL ACL公共读取无效签名
- it技术
- 互联网问答
- IT行业问题
- 计算机技术
- 编程语言问答
- 1个回答
- 在Go中切片
- algorithm
- 1个回答
- Kadmelia K-Bucket算法中的路由表,而无需遍历节点ID中的每个位
- dht
- bit-manipulation
- 1个回答
换一换