He initialization

The default initialization for linear and convolutional modules seems to be Glorot initialization, but for the commonly used ReLU activation function He initialization is superior, while only requiring a quick change to the stddev definition, should we implement better defaults? I know that there are many initialization schemes, I only suggest it as it would't be computationally expensive and would also be only a minor code change.

该提问来源于开源项目:deepmind/dm-haiku

查看全部
weixin_39854288
weixin_39854288
2020/11/21 20:00
  • 点赞
  • 收藏
  • 回答
    私信

5个回复