2020-11-21 21:47

# upsample in Mobilenet YOLOv3 coco

Hi

Output feature map size of conv15 is 1x1024x13x13. The next one is upsample layer. The upsample layer is using depth-wise deconvolution, correct? If so, why the convolution_param is not like below, since input feature map is 1x1024x13x13. num_output: 1024 group: 1024

Is there something wrong?

Thanks,

• 点赞
• 写回答
• 关注问题
• 收藏
• 复制链接分享
• 邀请回答

#### 6条回答

• Of course can set to 1024 , however , you can see this performance log , the deconvolution layer of mobilenet-yolov3-lite cost 1.38 ms , where I only set num_output=512 , and kernel_size = 1 , it is a bottleneck of inference speed in caffe

点赞 评论 复制链接分享
• For this case, input is 1x1024x13x13. If you are using depthwise deconv with 512 output channels, how does caffe handle the input feature map?

Does caffe perform same kernel twice? First time for 0-511x13x13. Second time for 512-1023x13x13?

Thanks

点赞 评论 复制链接分享
• You can refer this example , the limitation was the input channel can be divide by group number . Actually , it it not depthwise convolution , and also , I did not use pointwise convolution .

点赞 评论 复制链接分享
• Sorry, what I meant is separate convolution, i.e first convolution in depthwise convolution . According to your answer, it is belonging to a group convolution which is introduced by alexnet.

Originally, I thought the group convolution is output channel should be divided by group number. So do you meant there has another limitation that is input channel should be also divided by group number?

点赞 评论 复制链接分享
• After checking more details, what I thought for group convolution is not correct. The limitation for group convolution is both input channel and output channel should be divided by group number. Many thanks for your help.

点赞 评论 复制链接分享
• Of course can set to 1024 , however , you can see this performance log , the deconvolution layer of mobilenet-yolov3-lite cost 1.38 ms , where I only set num_output=512 , and kernel_size = 1 , it is a bottleneck of inference speed in caffe

According to this, if we are using 1024 as output channel, the inference speed will be much slower, correct?

点赞 评论 复制链接分享