tender羊 2021-09-05 22:51 采纳率: 60%
浏览 688
已结题

用tensorflow做训练os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0' 无法调用gpu执行

小菜鸡请各路好汉帮忙解决一下!万分感谢!
我确保所有安装版本是一一对应的,但是,我安装的是tensorflow-gpu 1.15.0版本,却无法调用GPU执行训练,

os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0' 

使用这句话时无法执行报错如下:

Traceback (most recent call last):
  File "H:/DHU/tutor/Deep_Learning/03/test/project/vgg_flower.py", line 180, in <module>
    sess.run(train,feed_dict={x:train_x,y:train_y})
  File "H:\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "H:\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "H:\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "H:\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node net1/Conv2D (defined at \DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]

Original stack trace for 'net1/Conv2D':
  File "/DHU/tutor/Deep_Learning/03/test/project/vgg_flower.py", line 150, in <module>
    act = vgg_network(x,y)
  File "/DHU/tutor/Deep_Learning/03/test/project/vgg_flower.py", line 49, in vgg_network
    net = tf.nn.conv2d(x,filter = get_variable('w',[3,3,3,net1_ketnel_size]),strides=[1,1,1,1],padding='SAME')
  File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 2010, in conv2d
    name=name)
  File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1071, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

当使用

os.environ['CUDA_VISIBLE_DEVICES'] = '1' 或者 os.environ['CUDA_VISIBLE_DEVICES'] = '2'

都会直接调用CPU执行。
使用下面代码效果和 上面一样

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"  
os.environ["CUDA_VISIBLE_DEVICES"]="0"

此外我还检查了下面的内容

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

结果正常显示为:

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 11367739030032531062
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3149044123
locality {
  bus_id: 1
  links {
  }
}
incarnation: 8400844841523226913
physical_device_desc: "device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

我尝试了这些,都没有效果

  方法一、设置定量的GPU显存使用量:
    config = tf.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 0.4 # 占用GPU40%的显存
    session = tf.Session(config=config)
  方法二、设置最小的GPU显存使用量,动态申请显存:(建议)
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    session = tf.Session(config=config)

请求各路好汉帮忙小菜鸡解决一下这个问题,感谢!

  • 写回答

1条回答 默认 最新

  • 爱晚乏客游 2021-09-06 09:34
    关注
    
    os.environ['CUDA_VISIBLE_DEVICES'] = '0' 
    

    你就一张显卡,那肯定是写个0就可以了啊,也就是默认编号为0的显卡,你指定1,2,3的话你本身又没有多显卡,那只能换成cpu执行了啊。如果你指定为0还是不行的话,就得检查一下你的cuda和cudnn有没有安装正确,并且要个tf的版本呢对应上

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

问题事件

  • 系统已结题 9月14日
  • 已采纳回答 9月6日
  • 创建了问题 9月5日

悬赏问题

  • ¥20 管道轴向耦合水击问题
  • ¥60 补全networkx TODO部分。
  • ¥15 有内推吗,云计算linux运维方向
  • ¥30 sort cuteSV.vcf by bcftools用IGV可视化出现报错
  • ¥100 SOS!对STK中导出的天体图像进行质心提取有没有人做过啊
  • ¥15 python 欧式距离
  • ¥15 运行qteasy报错
  • ¥15 遗传算法解决有工序顺序约束的大规模FJSP问题
  • ¥15 企业消防水炮塔设计方案
  • ¥20 WORKBENCH网格划分