2021-01-07 12:00

Pretraining networks for PPO

Hello, Firstly I have to say that you created an amazing Framework, thank you for that! I am trying to pretrain the policy network for the PPO algorithm, because it converges very slowly for my data. I would like to use a pretrained policy network, for which PPO only does some small optimizations. What I tried was basically something like this:

fetches = [self.agent.model.optimization, self.agent.model.loss_per_instance] 
feed_dict = {self.agent.model.states_input['state'] : batch_x,                                                                
                             self.agent.model.actions_input['action'] : batch_y,                                                               
                             self.agent.model.reward_input :  np.ones((train_batch_size,), dtype=np.float32)                                                                          
                             self.agent.model.terminal_input : np.zeros((train_batch_size,), dtype=np.bool_)                                                                            
                             self.agent.model.deterministic_input : True,                                                            
                             self.agent.model.update_input : True}
self.agent.model.session.run(fetches=fetches, feed_dict=feed_dict) 

Unfortunately this does only seem to update the baseline network, which is not what I want. I have to say that it is hard for me to get a good overview over the whole tensorflow graph. So maybe someone can help me with some pointers on how I can train the policy network specifically?

Regards, Mark


  • 点赞
  • 回答
  • 收藏
  • 复制链接分享