weixin_39810901
weixin_39810901
2020-11-21 21:47

upsample in Mobilenet YOLOv3 coco

Hi

For Mobilenet YOLOv3 on coco,

Output feature map size of conv15 is 1x1024x13x13. The next one is upsample layer. The upsample layer is using depth-wise deconvolution, correct? If so, why the convolution_param is not like below, since input feature map is 1x1024x13x13. num_output: 1024 group: 1024

Is there something wrong?

Thanks,

该提问来源于开源项目:eric612/MobileNet-YOLO

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

6条回答

  • weixin_39828102 weixin_39828102 5月前

    Of course can set to 1024 , however , you can see this performance log , the deconvolution layer of mobilenet-yolov3-lite cost 1.38 ms , where I only set num_output=512 , and kernel_size = 1 , it is a bottleneck of inference speed in caffe

    点赞 评论 复制链接分享
  • weixin_39810901 weixin_39810901 5月前

    For this case, input is 1x1024x13x13. If you are using depthwise deconv with 512 output channels, how does caffe handle the input feature map?

    Does caffe perform same kernel twice? First time for 0-511x13x13. Second time for 512-1023x13x13?

    Thanks

    点赞 评论 复制链接分享
  • weixin_39828102 weixin_39828102 5月前

    You can refer this example , the limitation was the input channel can be divide by group number . Actually , it it not depthwise convolution , and also , I did not use pointwise convolution .

    点赞 评论 复制链接分享
  • weixin_39810901 weixin_39810901 5月前

    Sorry, what I meant is separate convolution, i.e first convolution in depthwise convolution . According to your answer, it is belonging to a group convolution which is introduced by alexnet.

    Originally, I thought the group convolution is output channel should be divided by group number. So do you meant there has another limitation that is input channel should be also divided by group number?

    点赞 评论 复制链接分享
  • weixin_39810901 weixin_39810901 5月前

    After checking more details, what I thought for group convolution is not correct. The limitation for group convolution is both input channel and output channel should be divided by group number. Many thanks for your help.

    点赞 评论 复制链接分享
  • weixin_39810901 weixin_39810901 5月前

    Of course can set to 1024 , however , you can see this performance log , the deconvolution layer of mobilenet-yolov3-lite cost 1.38 ms , where I only set num_output=512 , and kernel_size = 1 , it is a bottleneck of inference speed in caffe

    According to this, if we are using 1024 as output channel, the inference speed will be much slower, correct?

    点赞 评论 复制链接分享

相关推荐