(/home/cx-a100/zb/fjh/ARR) root@cx-a100:/home/cx-a100/zb/fjh/arrow# yolo segment train data=coco8-seg.yaml model=yolov8m-seg.pt epochs=300 imgsz=640 device=0,1,2,3,4,5,6,7
Ultralytics YOLOv8.1.34 🚀 Python-3.10.14 torch-1.13.0+cu117 CUDA:0 (NVIDIA A100-PCIE-40GB, 40396MiB)
CUDA:1 (NVIDIA A100-PCIE-40GB, 40396MiB)
CUDA:2 (NVIDIA A100-PCIE-40GB, 40396MiB)
CUDA:3 (NVIDIA A100-PCIE-40GB, 40396MiB)
CUDA:4 (NVIDIA A100-PCIE-40GB, 40396MiB)
CUDA:5 (NVIDIA A100-PCIE-40GB, 40396MiB)
CUDA:6 (NVIDIA A100-PCIE-40GB, 40396MiB)
CUDA:7 (NVIDIA A100-PCIE-40GB, 40396MiB)
WARNING ⚠️ Upgrade to torch>=2.0.0 for deterministic training.
engine/trainer: task=segment, mode=train, model=yolov8m-seg.pt, data=coco8-seg.yaml, epochs=300, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=(0, 1, 2, 3, 4, 5, 6, 7), workers=8, project=None, name=train24, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/segment/train24
Overriding model.yaml nc=80 with nc=2
from n params module arguments
0 -1 1 1392 ultralytics.nn.modules.conv.Conv [3, 48, 3, 2]
1 -1 1 41664 ultralytics.nn.modules.conv.Conv [48, 96, 3, 2]
2 -1 2 111360 ultralytics.nn.modules.block.C2f [96, 96, 2, True]
3 -1 1 166272 ultralytics.nn.modules.conv.Conv [96, 192, 3, 2]
4 -1 4 813312 ultralytics.nn.modules.block.C2f [192, 192, 4, True]
5 -1 1 664320 ultralytics.nn.modules.conv.Conv [192, 384, 3, 2]
6 -1 4 3248640 ultralytics.nn.modules.block.C2f [384, 384, 4, True]
7 -1 1 1991808 ultralytics.nn.modules.conv.Conv [384, 576, 3, 2]
8 -1 2 3985920 ultralytics.nn.modules.block.C2f [576, 576, 2, True]
9 -1 1 831168 ultralytics.nn.modules.block.SPPF [576, 576, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 2 1993728 ultralytics.nn.modules.block.C2f [960, 384, 2]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 2 517632 ultralytics.nn.modules.block.C2f [576, 192, 2]
16 -1 1 332160 ultralytics.nn.modules.conv.Conv [192, 192, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 2 1846272 ultralytics.nn.modules.block.C2f [576, 384, 2]
19 -1 1 1327872 ultralytics.nn.modules.conv.Conv [384, 384, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 2 4207104 ultralytics.nn.modules.block.C2f [960, 576, 2]
22 [15, 18, 21] 1 5160182 ultralytics.nn.modules.head.Segment [2, 32, 192, [192, 384, 576]]
YOLOv8m-seg summary: 331 layers, 27240806 parameters, 27240790 gradients, 110.4 GFLOPs
Transferred 531/537 items from pretrained weights
DDP: debug command /home/cx-a100/zb/fjh/ARR/bin/python -m torch.distributed.run --nproc_per_node 8 --master_port 40671 /root/.config/Ultralytics/DDP/_temp_hk7ijrg_140128794374688.py
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
Traceback (most recent call last):
File "/home/cx-a100/zb/fjh/ARR/bin/yolo", line 8, in <module>
sys.exit(entrypoint())
File "/home/cx-a100/zb/fjh/ARR/lib/python3.10/site-packages/ultralytics/cfg/__init__.py", line 582, in entrypoint
getattr(model, mode)(**overrides) # default args from model
File "/home/cx-a100/zb/fjh/ARR/lib/python3.10/site-packages/ultralytics/engine/model.py", line 657, in train
self.trainer.train()
File "/home/cx-a100/zb/fjh/ARR/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 208, in train
raise e
File "/home/cx-a100/zb/fjh/ARR/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 206, in train
subprocess.run(cmd, check=True)
File "/home/cx-a100/zb/fjh/ARR/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/home/cx-a100/zb/fjh/ARR/bin/python', '-m', 'torch.distributed.run', '--nproc_per_node', '8', '--master_port', '40671', '/root/.config/Ultralytics/DDP/_temp_hk7ijrg_140128794374688.py']' returned non-zero exit status 1.
为什么单卡可以,多卡就不行了?