代码用tensorflow-CPU运行时没有错误,用GPU运行时每次到51%报错

图片说明
51%|████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 199/391 [00:38<00:21, 8.81it/s]2019-08-12 20:20:04.963304: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0000016EAC1D0A40
2019-08-12 20:20:05.763636: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
** On entry to SGEMM parameter number 10 had an illegal value
2019-08-12 20:20:06.320473: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 5236925 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.328931: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1871 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.838588: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 687520 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.850771: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.999345: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 42770 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:07.499292: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1497278 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:07.510245: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.020011: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 256112 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.529828: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 341471 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.540870: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 16833 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.697339: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1190 for batch index 0, expected info = 0. Debug_info = heevd
Traceback (most recent call last):
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[{{node KFAC/SelfAdjointEigV2_10}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 67, in
main()
File "main.py", line 63, in main
trainer.train()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 16, in train
self.train_epoch()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 42, in train_epoch
self.sess.run([self.model.inv_update_op, self.model.var_update_op], feed_dict=feed_dict)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]]

Caused by op 'KFAC/SelfAdjointEigV2_10', defined at:
File "main.py", line 67, in
main()
File "main.py", line 60, in main
model_ = Model(config, INPUT_DIM[config.dataset], len(train_loader.dataset))
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 21, in __init
_
self.init_optim()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 70, in init_optim
momentum=self.config.momentum)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\optimizer.py", line 66, in init
inv_devices=inv_devices)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 58, in init
setup = self._setup(cov_ema_decay)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in setup
inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()}
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in
inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()}
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 116, in _get_all_inverse_update_ops
for op in factor.make_inverse_update_ops():
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\fisher_factors.py", line 360, in make_inverse_update_ops
ops.append(inv.assign(utils.posdef_inv(self._cov, damping)))
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 144, in posdef_inv
return posdef_inv_functionsPOSDEF_INV_METHOD
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 161, in posdef_inv_eig
tensor + damping * identity)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\linalg_ops.py", line 328, in self_adjoint_eig
e, v = gen_linalg_ops.self_adjoint_eig_v2(tensor, compute_v=True, name=name)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\gen_linalg_ops.py", line 2016, in self_adjoint_eig_v2
"SelfAdjointEigV2", input=input, compute_v=compute_v, name=name)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
op_def=op_def)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in __init
_
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]]


1个回答

看看是不是内存不够了,把batch调小一点。

xixihaha233
xixihaha233 用CPU运行是没有错误的
6 个月之前 回复
xixihaha233
xixihaha233 不是,内存我看过了,也改过了,设置为0.5,还是出错,而且是51%能够地方,我不知道这个错误是什么
6 个月之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问