xixihaha233 2019-08-12 20:25 采纳率: 0%
浏览 925

tensorflow代码用CPU运行时没有错误,用GPU运行时每次到51%报错,网上没有搜到相同的问题

51%|████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 199/391 [00:38<00:21, 8.81it/s]2019-08-12 20:20:04.963304: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0000016EAC1D0A40
2019-08-12 20:20:05.763636: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
** On entry to SGEMM parameter number 10 had an illegal value
2019-08-12 20:20:06.320473: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 5236925 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.328931: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1871 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.838588: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 687520 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.850771: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.999345: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 42770 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:07.499292: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1497278 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:07.510245: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.020011: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 256112 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.529828: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 341471 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.540870: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 16833 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.697339: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1190 for batch index 0, expected info = 0. Debug_info = heevd
Traceback (most recent call last):
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[{{node KFAC/SelfAdjointEigV2_10}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 67, in
main()
File "main.py", line 63, in main
trainer.train()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 16, in train
self.train_epoch()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 42, in train_epoch
self.sess.run([self.model.inv_update_op, self.model.var_update_op], feed_dict=feed_dict)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]]

Caused by op 'KFAC/SelfAdjointEigV2_10', defined at:
File "main.py", line 67, in
main()
File "main.py", line 60, in main
model_ = Model(config, INPUT_DIM[config.dataset], len(train_loader.dataset))
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 21, in __init
_
self.init_optim()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 70, in init_optim
momentum=self.config.momentum)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\optimizer.py", line 66, in init
inv_devices=inv_devices)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 58, in init
setup = self._setup(cov_ema_decay)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in setup
inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()}
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in
inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()}
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 116, in _get_all_inverse_update_ops
for op in factor.make_inverse_update_ops():
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\fisher_factors.py", line 360, in make_inverse_update_ops
ops.append(inv.assign(utils.posdef_inv(self._cov, damping)))
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 144, in posdef_inv
return posdef_inv_functionsPOSDEF_INV_METHOD
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 161, in posdef_inv_eig
tensor + damping * identity)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\linalg_ops.py", line 328, in self_adjoint_eig
e, v = gen_linalg_ops.self_adjoint_eig_v2(tensor, compute_v=True, name=name)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\gen_linalg_ops.py", line 2016, in self_adjoint_eig_v2
"SelfAdjointEigV2", input=input, compute_v=compute_v, name=name)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
op_def=op_def)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in __init
_
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]]


  • 写回答

1条回答 默认 最新

  • threenewbee 2019-08-12 22:11
    关注

    看看是不是内存不够了,把batch调小一点。

    评论

报告相同问题?

悬赏问题

  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示