小菜鸡请各路好汉帮忙解决一下!万分感谢!
我确保所有安装版本是一一对应的,但是,我安装的是tensorflow-gpu 1.15.0版本,却无法调用GPU执行训练,
os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0'
使用这句话时无法执行报错如下:
Traceback (most recent call last):
File "H:/DHU/tutor/Deep_Learning/03/test/project/vgg_flower.py", line 180, in <module>
sess.run(train,feed_dict={x:train_x,y:train_y})
File "H:\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "H:\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "H:\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "H:\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node net1/Conv2D (defined at \DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
Original stack trace for 'net1/Conv2D':
File "/DHU/tutor/Deep_Learning/03/test/project/vgg_flower.py", line 150, in <module>
act = vgg_network(x,y)
File "/DHU/tutor/Deep_Learning/03/test/project/vgg_flower.py", line 49, in vgg_network
net = tf.nn.conv2d(x,filter = get_variable('w',[3,3,3,net1_ketnel_size]),strides=[1,1,1,1],padding='SAME')
File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 2010, in conv2d
name=name)
File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "\DHU\tutor\Deep_Learning\03\test\venv\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
当使用
os.environ['CUDA_VISIBLE_DEVICES'] = '1' 或者 os.environ['CUDA_VISIBLE_DEVICES'] = '2' 时
都会直接调用CPU执行。
使用下面代码效果和 上面一样
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
此外我还检查了下面的内容
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
结果正常显示为:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 11367739030032531062
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3149044123
locality {
bus_id: 1
links {
}
}
incarnation: 8400844841523226913
physical_device_desc: "device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1"
]
我尝试了这些,都没有效果
方法一、设置定量的GPU显存使用量:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4 # 占用GPU40%的显存
session = tf.Session(config=config)
方法二、设置最小的GPU显存使用量,动态申请显存:(建议)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
请求各路好汉帮忙小菜鸡解决一下这个问题,感谢!