ls_6468 2020-07-27 11:35 采纳率: 0%
浏览 277

使用apex进行混合精度训练,pytorch版,在导入时出现子进程错误

报错信息如下:

Traceback (most recent call last):
  File "apex_sst.py", line 16, in <module>
  File "apex_sst.py", line 16, in <module>
  File "apex_sst.py", line 16, in <module>
Traceback (most recent call last):
  File "apex_sst.py", line 16, in <module>
    from apex import amp
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/__init__.py", line 18, in <module>
    from apex import amp
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/__init__.py", line 18, in <module>
    from apex import amp
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/__init__.py", line 18, in <module>
    from apex import amp
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/__init__.py", line 18, in <module>
    from apex.interfaces import (ApexImplementation,
    from apex.interfaces import (ApexImplementation,
    from apex.interfaces import (ApexImplementation,
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/interfaces.py", line 10, in <module>
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/interfaces.py", line 10, in <module>
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/interfaces.py", line 10, in <module>
    from apex.interfaces import (ApexImplementation,
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/interfaces.py", line 10, in <module>
    class ApexImplementation(object):
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/interfaces.py", line 14, in ApexImplementation
    class ApexImplementation(object):
    class ApexImplementation(object):
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/interfaces.py", line 14, in ApexImplementation
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/interfaces.py", line 14, in ApexImplementation
    class ApexImplementation(object):
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/apex/interfaces.py", line 14, in ApexImplementation
    implements(IApex)
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/zope/interface/declarations.py", line 706, in implements
    implements(IApex)
    implements(IApex)
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/zope/interface/declarations.py", line 706, in implements
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/zope/interface/declarations.py", line 706, in implements
    implements(IApex)
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/zope/interface/declarations.py", line 706, in implements
    raise TypeError(_ADVICE_ERROR % 'implementer')
    raise TypeError(_ADVICE_ERROR % 'implementer')
    raise TypeError(_ADVICE_ERROR % 'implementer')
TypeError: Class advice impossible in Python3.  Use the @implementer class decorator instead.
TypeError: Class advice impossible in Python3.  Use the @implementer class decorator instead.
TypeError: Class advice impossible in Python3.  Use the @implementer class decorator instead.
    raise TypeError(_ADVICE_ERROR % 'implementer')
TypeError: Class advice impossible in Python3.  Use the @implementer class decorator instead.
Traceback (most recent call last):
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/GPUFS/app_GPU/application/anaconda3/5.3.1/envs/pytorch14/bin/python', '-u', 'apex_sst.py', '--local_rank=3']' returned non-zero exit status 1.

使用的执行命令是

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 apex_sst.py

环境一个节点上有4块GPU。

  • 写回答

1条回答 默认 最新

  • zqbnqsdsmd 2020-07-31 08:35
    关注
    评论

报告相同问题?

悬赏问题

  • ¥15 anaconda下载后spyder内无法正常运行
  • ¥20 统计PDF文件指定词语的出现的页码
  • ¥50 分析一个亿级消息接收处理策略的问题?
  • ¥20 uniapp 朋友圈分享单页面自定义操作
  • ¥15 r语言构建二元logistics回归模型及列线图,ROC曲线很奇怪
  • ¥200 关于#matlab#的问题:如图所示的四元二次方程组,想消元消掉A B C D
  • ¥15 如何在envi中通过matlab提取树种纹理特征 并利用纹理特征和光谱指数进行树种分类
  • ¥15 圣天诺的。 到期就会有一个60秒的弹窗
  • ¥15 圣天诺的。 到期就会有一个60秒的弹窗。
  • ¥15 Python脚本打包成 .exe的问题