在我们的固有观念里面认为model size越大inference time随之变大
但在近期做试验的过程中发现U2-Net†仅有4MB,但inference time有371
而U-NET有7MB,inference time为58
batch size同为12,同GPU,服务器未运行其他任何程序。
所以两者是否真的有联系?
在我们的固有观念里面认为model size越大inference time随之变大
但在近期做试验的过程中发现U2-Net†仅有4MB,但inference time有371
而U-NET有7MB,inference time为58
batch size同为12,同GPU,服务器未运行其他任何程序。
所以两者是否真的有联系?
其实算inference time比较复杂吧,这里简单提两个要注意的点: asynchronous execution and GPU warm up
具体细节可以看下这篇文章: https://towardsdatascience.com/the-correct-way-to-measure-inference-time-of-deep-neural-networks-304a54e5187f
下面上一份pytorch算inference time的代码:
import torch
import numpy as np
import torchvision.models as models
model = models.vgg16()
device = torch.device("cuda")
model.to(device)
model.eval()
dummy_input = torch.randn(1, 3, 224, 224, dtype = torch.float).to(device)
starter, ender = torch.cuda.Event(enable_timing = True), torch.cuda.Event(enable_timing = True)
repetitions = 300
timings = np.zeros((repetitions, 1))
#GPU-WARM-UP
for _ in range(10):
_ = model(dummy_input)
# MEASURE PERFORMANCE
with torch.no_grad():
for rep in range(repetitions):
starter.record()
_ = model(dummy_input)
ender.record()
# WAIT FOR GPU SYNC
torch.cuda.synchronize()
curr_time = starter.elapsed_time(ender)
timings[rep] = curr_time
mean_syn = np.sum(timings) / repetitions
std_syn = np.std(timings)
print(mean_syn)
print(std_syn)