运行mask2former进行训练,使用8张GPU进行分布式训练,每次训练完一轮进行验证后都出现这个问题,50次迭代和5000次迭代测试都会出现这个问题,改过batch也没用。2.4的torch,3.9的python
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGKILL
(mask2former) liuhj@liuhj-NF5468M5:~/workspace/lhj/Mask2Former-main$ /home/liuhj/anaconda3/envs/mask2former/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 320 leaked semaphore objects to clean up at shutdown
torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGKILL
- 写回答
- 好问题 0 提建议
- 关注问题
- 邀请回答
-