weixin_39635657
weixin_39635657
2021-01-11 02:41

What is the configuration of your computer, such as GPU model and GPU memory size

Thank you for your amazing job! What is the configuration of your computer, such as GPU model and GPU memory size? I'm looking forward to your reply.

该提问来源于开源项目:zylo117/Yet-Another-EfficientDet-Pytorch

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

5条回答

  • weixin_39970994 weixin_39970994 4月前

    2080ti, 11G

    点赞 评论 复制链接分享
  • weixin_39635657 weixin_39635657 4月前

    If I want to train D7, how many GPUs(2080ti, 11G) should I prepare?

    点赞 评论 复制链接分享
  • weixin_39610488 weixin_39610488 4月前

    I believe that you would not be able to fit the D7 model on 2080 Ti. It does not matter how many as D7 would not fit into RAM of a single 2080Ti (and you need to be able to load whole model into RAM of each GPU) even with batchsize of 1, unless you 1) train head-only 2) decrease the default resize resolution of input images in the train.py file 3) turn relevant calculations to FP16 (and thus make use of your tensor cores) - I would actually appreciate if could give some tips on how to do that in the code

    I have a 2080 Ti and I stumbled at D5

    点赞 评论 复制链接分享
  • weixin_39970994 weixin_39970994 4月前

    It seems training in FP16 is much harder for effdet based on my previous experiments. Most models like yolo, rcnn can performs well in fp16 even if they are trained in fp32. But effdet doesn't. You can try running coco_eval in fp16 with fp32 weights and the mAP will be half of it used to be. Probably because there are too many shared parameters, so they have to be more precise.

    You can modify the train.py following the coco_eval.py to train in fp16. BTW, fp16 is not supported when using dataparallel.

    点赞 评论 复制链接分享
  • weixin_39610488 weixin_39610488 4月前

    Just investigating the memory issue further: Here is a thread from the official repository of EfficientDet. Looks like memory for GPU training is a big problem: - D0 with batch size 8 not trainable on 11Gb GPU - D5 with batch size 1 not trainable on 11Gb GPU - D6 with batch size 1 not trainable on 24Gb(!) GPU

    However it can be trained on TPU without same memory issues - D7 with batch size 4 trainable on TPUv3, where each core has 16Gb

    Obviously Google cares about TPU and nuances of the code could be different.

    Someone suggested that the trouble is caused by lines like this: # Sum per level losses to total loss. cls_loss = tf.add_n(cls_losses) box_loss = tf.add_n(box_losses) as TensorFlow has to keep activations from each layer to aggregate a single gradient

    For Tensorflow, there is a probable solution here (aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) but it does not seem to be working for EfficientDet implementation as people are still complaining.

    However a solution must exist as training (of the Tensorflow official model) works on TPU

    点赞 评论 复制链接分享

相关推荐