yzew 2022-10-24 10:11 采纳率: 0%
浏览 700

HRFormer训练时报错

HRFormer程序训练时报错
mmpse框架下的代码

运行 bash run_dist.sh top_down/hrt/coco/hrt_base_coco_384x288后,报错如下:

  FutureWarning,
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Traceback (most recent call last):
  File "tools/train.py", line 168, in <module>
    main()
  File "tools/train.py", line 122, in main
    env_info_dict = collect_env()
  File "/dataset/wh/wh_code/HRFormer-main/pose/mmpose/utils/collect_env.py", line 8, in collect_env
    env_info = collect_basic_env()
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/env.py", line 85, in collect_env
    from mmcv.ops import get_compiler_version, get_compiling_cuda_version
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/ops/__init__.py", line 1, in <module>
    from .bbox import bbox_overlaps
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/ops/bbox.py", line 3, in <module>
    ext_module = ext_loader.load_ext('_ext', ['bbox_overlaps'])
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/ext_loader.py", line 12, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _Z13__THCudaCheck9cudaErrorPKci
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 42674) of binary: /home/celia/anaconda3/envs/open-mmlab/bin/python
Traceback (most recent call last):
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
    )(*cmd_args)
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/celia/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
tools/train.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2022-10-24_10:03:43
  host      : omnisky
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 42675)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2022-10-24_10:03:43
  host      : omnisky
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 42676)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
  time      : 2022-10-24_10:03:43
  host      : omnisky
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 42677)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-10-24_10:03:43
  host      : omnisky
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 42674)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================`
环境按官方的配好了。这个报错怎么也搜不到是什么原因
  • 写回答

1条回答 默认 最新

  • 「已注销」 2022-10-24 10:20
    关注

    你把报错发给我看看

    评论

报告相同问题?

问题事件

  • 创建了问题 10月24日

悬赏问题

  • ¥15 两台交换机分别是trunk接口和access接口为何无法通信,通信过程是如何?
  • ¥15 C语言使用vscode编码错误
  • ¥15 用KSV5转成本时,如何不生成那笔中间凭证
  • ¥20 ensp怎么配置让PC1和PC2通讯上
  • ¥50 有没有适合匹配类似图中的运动规律的图像处理算法
  • ¥15 dnat基础问题,本机发出,别人返回的包,不能命中
  • ¥15 请各位帮我看看是哪里出了问题
  • ¥15 vs2019的js智能提示
  • ¥15 关于#开发语言#的问题:FDTD建模问题图中代码没有报错,但是模型却变透明了
  • ¥15 uniapp的h5项目写一个抽奖动画