xixihaha233 2019-08-12 20:31 采纳率: 0%
浏览 1879

代码用tensorflow-CPU运行时没有错误,用GPU运行时每次到51%报错

图片说明
51%|████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 199/391 [00:38<00:21, 8.81it/s]2019-08-12 20:20:04.963304: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0000016EAC1D0A40
2019-08-12 20:20:05.763636: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
** On entry to SGEMM parameter number 10 had an illegal value
2019-08-12 20:20:06.320473: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 5236925 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.328931: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1871 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.838588: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 687520 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.850771: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:06.999345: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 42770 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:07.499292: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1497278 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:07.510245: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.020011: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 256112 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.529828: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 341471 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.540870: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 16833 for batch index 0, expected info = 0. Debug_info = heevd
2019-08-12 20:20:08.697339: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1190 for batch index 0, expected info = 0. Debug_info = heevd
Traceback (most recent call last):
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[{{node KFAC/SelfAdjointEigV2_10}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 67, in
main()
File "main.py", line 63, in main
trainer.train()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 16, in train
self.train_epoch()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 42, in train_epoch
self.sess.run([self.model.inv_update_op, self.model.var_update_op], feed_dict=feed_dict)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]]

Caused by op 'KFAC/SelfAdjointEigV2_10', defined at:
File "main.py", line 67, in
main()
File "main.py", line 60, in main
model_ = Model(config, INPUT_DIM[config.dataset], len(train_loader.dataset))
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 21, in __init
_
self.init_optim()
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 70, in init_optim
momentum=self.config.momentum)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\optimizer.py", line 66, in init
inv_devices=inv_devices)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 58, in init
setup = self._setup(cov_ema_decay)
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in setup
inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()}
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in
inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()}
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 116, in _get_all_inverse_update_ops
for op in factor.make_inverse_update_ops():
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\fisher_factors.py", line 360, in make_inverse_update_ops
ops.append(inv.assign(utils.posdef_inv(self._cov, damping)))
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 144, in posdef_inv
return posdef_inv_functionsPOSDEF_INV_METHOD
File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 161, in posdef_inv_eig
tensor + damping * identity)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\linalg_ops.py", line 328, in self_adjoint_eig
e, v = gen_linalg_ops.self_adjoint_eig_v2(tensor, compute_v=True, name=name)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\gen_linalg_ops.py", line 2016, in self_adjoint_eig_v2
"SelfAdjointEigV2", input=input, compute_v=compute_v, name=name)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
op_def=op_def)
File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in __init
_
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd
[[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]]


  • 写回答

1条回答 默认 最新

  • threenewbee 2019-08-12 22:09
    关注

    看看是不是内存不够了,把batch调小一点。

    评论

报告相同问题?

悬赏问题

  • ¥15 Python时间序列如何拟合疏系数模型
  • ¥15 求学软件的前人们指明方向🥺
  • ¥50 如何增强飞上天的树莓派的热点信号强度,以使得笔记本可以在地面实现远程桌面连接
  • ¥15 MCNP里如何定义多个源?
  • ¥20 双层网络上信息-疾病传播
  • ¥50 paddlepaddle pinn
  • ¥20 idea运行测试代码报错问题
  • ¥15 网络监控:网络故障告警通知
  • ¥15 django项目运行报编码错误
  • ¥15 STM32驱动继电器