weixin_43589475
weixin_43589475
采纳率50%
2020-04-28 10:03 阅读 352

tensorflow-gpu进行3DUnet训练,jupyter出现服务重启?

5

我使用了tensorflow1.4.0+CUDA8.0+cudnn6.0进行深度学习的训练,当训练进行到第一个epoch结束的时候就会出现jupyter服务重启的问题,按照之前的博主限制了显卡的占用率,也还是没有效果,查了一下nvidia-smi,显示显卡也有正常调用,很困惑,明明安装了CUDA,版本也应该是正确的,求各位大佬解答。
限制显卡占用的代码

import keras.backend.tensorflow_backend as ktf
import tensorflow as tf
import os
os.environ['CUDA_VISIBLE_DEVICES']='0'
Conf = tf.ConfigProto()
Conf.gpu_options.per_process_gpu_memory_fraction = 0.5
Conf.gpu_options.allow_growth = True
sess = tf.Session(config = Conf)
ktf.set_session(sess)

查询nvidia-smi的显示
图片说明

运行一个epoch后的显示
图片说明
图片说明

以下是错误信息

Exception in thread Thread-6:
Traceback (most recent call last):
  File "e:\anaconda3\envs\tensorflow\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "e:\anaconda3\envs\tensorflow\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "e:\anaconda3\envs\tensorflow\lib\site-packages\keras\utils\data_utils.py", line 568, in data_generator_task
    generator_output = next(self._generator)
  File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 155, in data_generator
    skip_blank=skip_blank, permute=permute)
  File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 210, in add_data
    data, truth = get_data_from_file(data_file, index, patch_shape=patch_shape)
  File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 234, in get_data_from_file
    data, truth = get_data_from_file(data_file, index, patch_shape=None)
  File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 238, in get_data_from_file
    x, y = data_file.root.data[index], data_file.root.truth[index, 0]
  File "e:\anaconda3\envs\tensorflow\lib\site-packages\tables\array.py", line 658, in __getitem__
    arr = self._read_slice(startl, stopl, stepl, shape)
  File "e:\anaconda3\envs\tensorflow\lib\site-packages\tables\array.py", line 762, in _read_slice
    self._g_read_slice(startl, stopl, stepl, nparr)
  File "tables\hdf5extension.pyx", line 1585, in tables.hdf5extension.Array._g_read_slice
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dio.c", line 199, in H5Dread
    can't read data
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dio.c", line 601, in H5D__read
    can't read data
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dchunk.c", line 2282, in H5D__chunk_read
    chunked read failed
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dselect.c", line 283, in H5D__select_read
    read error
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dselect.c", line 118, in H5D__select_io
    can't retrieve I/O vector size
  File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5CX.c", line 1341, in H5CX_get_vec_size
    can't get default dataset transfer property list

End of HDF5 error back trace

Problems reading the array data.
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

1条回答 默认 最新

  • 已采纳
    caozhy 从今以后生命中的每一秒都属于我爱的人 2020-04-28 15:54

    笔记本的显示卡散热不行,显存也小,所以不稳定。建议你找桌面GTX1060/1660以上的卡来测试。

    点赞 评论 复制链接分享

相关推荐