2021-01-07 11:54 阅读 3

Implementing DeepMind Pycolab environment

Hi! I'm trying to implement DeepMind Pycolab environment for TensorForce but I have some questions related to design. Since Pycolab has no fixed list of environments, we need to decide what we pass to the constructor for the environment. I tried to implement it by passing engine object (which gives observations and reward when action is taken) and human_ui object (which defines key mapping of actions and croppers which introduce partial observability) because engine object is not sufficient as it is independent of action space. But now, I'm stuck at implementing reset method for the environment because initally, engine is created using ASCII art of level and we can't access inital level design information from engine object at any state. Can you please suggest what would be the best way for implementation here?


  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

5条回答 默认 最新

  • weixin_39743695 weixin_39743695 2021-01-07 11:54

    Phew, since I don't know the details about how Pycolab works, it's hard to suggest details. Ideally most of the Pycolab internals are handled inside the Environment class and controlled via higher-level constructor arguments (like gym_id for OpenAIGym, for instance). engine and human_ui already sound more like internal stuff to me -- but you will be in the best position to assess what can be internal and what needs to be passed in. ;-)

    Regarding reset(): There must be a way to get the initial screen (which is what needs to be returned) somehow, right? If the engine object alone is not enough to retrieve it, maybe something in the way you currently setup the internals needs to be changed? May be I also misunderstand what "screens" in Pycolab look like, in which case please correct me.

    Cool stuff, looking forward to a PR! :+1:

    点赞 评论 复制链接分享
  • weixin_39945816 weixin_39945816 2021-01-07 11:54

    Hi! The implementation is almost done. The documentation of Pycolab suggests creating a new engine everytime you want to run a new episode rather than changing values in the existing engine (it is very strict about restricting such modifications) so I'm going with that approach. But I encountered some bugs in the running of policy training for games with discrete state space ( i.e. all games in pycolab ) For example, running existing code for OpenAI environment Frozen Lake breaks with error

      'Invalid input rank for linear layer: {}, must be 2.'.format(util.rank(x))
    tensorforce.exception.TensorForceError: Invalid input rank for linear layer: 1, must be 2.

    For replicating the error try

    python examples/openai_gym.py FrozenLake8x8-v0 -a examples/configs/vpg.json -n examples/configs/mlp2_network.json -e 2000 -m 200

    It's probably because of how state is defined for discrete space games here. Can you please take a look at that?

    Apart from that there is a minor fix required here because input (discrete space) is now int and it needs to to be type casted into float32. Is it fine to have that in this PR or should I have a seperate one for that?

    点赞 评论 复制链接分享
  • weixin_39743695 weixin_39743695 2021-01-07 11:54

    Discrete inputs have to be handled differently, just converting them to floats is not a good approach for various reasons (e.g. one may consider each discrete action to be equally different from others, but integers converted to floats introduce non-uniform distances). A possible way to handle discrete actions is to use an embedding layer, essentially associating with each action a learnable vector representation, see here. Can you try it that way and see whether it works?

    Also, regarding engine etc, I don't mean to suggest how to handle the specific objects, so doing whatever Pycolab requires is of course the way to go. I would expect, though, that it can somehow be fitted into the Environment interface, handling these things internally (may of course not be true).

    点赞 评论 复制链接分享
  • weixin_39945816 weixin_39945816 2021-01-07 11:54

    I have added a flatten layer after the embedding layer since we have state space as a vector of discrete integer values (for example, batch of states is vector of shape [?, 625], which is converted into [?, 625, 32] after embedding, so we flatten it to [?, 625*32] before passing to linear layer). Is that the right way to go about it? And regarding the Pycolab internals, I have handled most of it inside the Environment interface and the game definition is in a separate file (I have used an example game from Pycolab as an example here) which can be given as input in the example usage to setup the environment.

    点赞 评论 复制链接分享
  • weixin_39743695 weixin_39743695 2021-01-07 11:54

    Yes, flattening is a reasonable thing to do with multiple embeddings.

    Pycolab sounds good. I see you created a PR, that's great, thanks. I will review it in the next days and comment on it.

    Since the PR resolves the issue, I assume this can be closed, and we can discuss things there.

    点赞 评论 复制链接分享