sb3框架的eval回调是不是有问题
我设置了50步eval一次,但是每一次都调用了eval的回调函数,关键的一点是我训练时,逻辑很正常,eval时,就从没有调用过我自定义的Policy网络
CustomPolicy forward method is called.
Env step called. Action: 8, is_eval: train
当前步数: 1, 学习率: 0.0003,seed: 50, 奖励: [-10.], 标志: False
EvalCallback _on_step is called.
CustomPolicy forward method is called.
Env step called. Action: 5, is_eval: train
当前步数: 2, 学习率: 0.0003,seed: 50, 奖励: [1.8818476], 标志: False
EvalCallback _on_step is called.
CustomPolicy forward method is called.
Env step called. Action: 36, is_eval: train
当前步数: 3, 学习率: 0.0003,seed: 50, 奖励: [-2.3012323], 标志: False
EvalCallback _on_step is called.
CustomPolicy forward method is called.
Env step called. Action: 33, is_eval: train
当前步数: 4, 学习率: 0.0003,seed: 50, 奖励: [-0.16285548], 标志: False
EvalCallback _on_step is called.
CustomPolicy forward method is called.
Env step called. Action: 2, is_eval: train
当前步数: 5, 学习率: 0.0003,seed: 50, 奖励: [-0.5629139], 标志: False
EvalCallback _on_step is called.
CustomPolicy forward method is called.
Env step called. Action: 4, is_eval: train
当前步数: 6, 学习率: 0.0003,seed: 50, 奖励: [-0.5300567], 标志: False
EvalCallback _on_step is called.
CustomPolicy forward method is called.
Env step called. Action: 9, is_eval: train
当前步数: 7, 学习率: 0.0003,seed: 50, 奖励: [-1.0008886], 标志: False