weixin_43589475 2020-04-28 10:03 采纳率: 50%
浏览 435
已采纳

tensorflow-gpu进行3DUnet训练,jupyter出现服务重启?

我使用了tensorflow1.4.0+CUDA8.0+cudnn6.0进行深度学习的训练,当训练进行到第一个epoch结束的时候就会出现jupyter服务重启的问题,按照之前的博主限制了显卡的占用率,也还是没有效果,查了一下nvidia-smi,显示显卡也有正常调用,很困惑,明明安装了CUDA,版本也应该是正确的,求各位大佬解答。
限制显卡占用的代码

import keras.backend.tensorflow_backend as ktf
import tensorflow as tf
import os
os.environ['CUDA_VISIBLE_DEVICES']='0'
Conf = tf.ConfigProto()
Conf.gpu_options.per_process_gpu_memory_fraction = 0.5
Conf.gpu_options.allow_growth = True
sess = tf.Session(config = Conf)
ktf.set_session(sess)

查询nvidia-smi的显示
图片说明

运行一个epoch后的显示
图片说明
图片说明

以下是错误信息

Exception in thread Thread-6:
Traceback (most recent call last):
  File "e:\anaconda3\envs\tensorflow\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "e:\anaconda3\envs\tensorflow\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "e:\anaconda3\envs\tensorflow\lib\site-packages\keras\utils\data_utils.py", line 568, in data_generator_task
    generator_output = next(self._generator)
  File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 155, in data_generator
    skip_blank=skip_blank, permute=permute)
  File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 210, in add_data
    data, truth = get_data_from_file(data_file, index, patch_shape=patch_shape)
  File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 234, in get_data_from_file
    data, truth = get_data_from_file(data_file, index, patch_shape=None)
  File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 238, in get_data_from_file
    x, y = data_file.root.data[index], data_file.root.truth[index, 0]
  File "e:\anaconda3\envs\tensorflow\lib\site-packages\tables\array.py", line 658, in __getitem__
    arr = self._read_slice(startl, stopl, stepl, shape)
  File "e:\anaconda3\envs\tensorflow\lib\site-packages\tables\array.py", line 762, in _read_slice
    self._g_read_slice(startl, stopl, stepl, nparr)
  File "tables\hdf5extension.pyx", line 1585, in tables.hdf5extension.Array._g_read_slice
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dio.c", line 199, in H5Dread
    can't read data
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dio.c", line 601, in H5D__read
    can't read data
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dchunk.c", line 2282, in H5D__chunk_read
    chunked read failed
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dselect.c", line 283, in H5D__select_read
    read error
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dselect.c", line 118, in H5D__select_io
    can't retrieve I/O vector size
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5CX.c", line 1341, in H5CX_get_vec_size
    can't get default dataset transfer property list

End of HDF5 error back trace

Problems reading the array data.
  • 写回答

1条回答 默认 最新

  • threenewbee 2020-04-28 15:54
    关注

    笔记本的显示卡散热不行,显存也小,所以不稳定。建议你找桌面GTX1060/1660以上的卡来测试。

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥15 一道python难题3
  • ¥15 用matlab 设计一个不动点迭代法求解非线性方程组的代码
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler
  • ¥15 oracle集群安装出bug
  • ¥15 关于#python#的问题:自动化测试