语言模型推理代码为:
hf_generator = pipeline("text2text-generation", model="aaa"
output = hf_generator(prompt, max_length=len(prompt)+128, do_sample=True)
每次推理都会显示:
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00, 4.62s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00, 4.62s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00, 4.64s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00, 4.74s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00, 4.17s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00, 4.67s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00, 4.69s/it]
导致十分占用时间,如何解决?