我使用了tensorflow1.4.0+CUDA8.0+cudnn6.0进行深度学习的训练,当训练进行到第一个epoch结束的时候就会出现jupyter服务重启的问题,按照之前的博主限制了显卡的占用率,也还是没有效果,查了一下nvidia-smi,显示显卡也有正常调用,很困惑,明明安装了CUDA,版本也应该是正确的,求各位大佬解答。
限制显卡占用的代码
import keras.backend.tensorflow_backend as ktf
import tensorflow as tf
import os
os.environ['CUDA_VISIBLE_DEVICES']='0'
Conf = tf.ConfigProto()
Conf.gpu_options.per_process_gpu_memory_fraction = 0.5
Conf.gpu_options.allow_growth = True
sess = tf.Session(config = Conf)
ktf.set_session(sess)
查询nvidia-smi的显示
运行一个epoch后的显示
以下是错误信息
Exception in thread Thread-6:
Traceback (most recent call last):
File "e:\anaconda3\envs\tensorflow\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "e:\anaconda3\envs\tensorflow\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "e:\anaconda3\envs\tensorflow\lib\site-packages\keras\utils\data_utils.py", line 568, in data_generator_task
generator_output = next(self._generator)
File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 155, in data_generator
skip_blank=skip_blank, permute=permute)
File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 210, in add_data
data, truth = get_data_from_file(data_file, index, patch_shape=patch_shape)
File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 234, in get_data_from_file
data, truth = get_data_from_file(data_file, index, patch_shape=None)
File "E:\Jupyter\3DUnetCNN\unet3d\generator.py", line 238, in get_data_from_file
x, y = data_file.root.data[index], data_file.root.truth[index, 0]
File "e:\anaconda3\envs\tensorflow\lib\site-packages\tables\array.py", line 658, in __getitem__
arr = self._read_slice(startl, stopl, stepl, shape)
File "e:\anaconda3\envs\tensorflow\lib\site-packages\tables\array.py", line 762, in _read_slice
self._g_read_slice(startl, stopl, stepl, nparr)
File "tables\hdf5extension.pyx", line 1585, in tables.hdf5extension.Array._g_read_slice
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dio.c", line 199, in H5Dread
can't read data
File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dio.c", line 601, in H5D__read
can't read data
File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dchunk.c", line 2282, in H5D__chunk_read
chunked read failed
File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dselect.c", line 283, in H5D__select_read
read error
File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5Dselect.c", line 118, in H5D__select_io
can't retrieve I/O vector size
File "D:\pytables_hdf5\CMake-hdf5-1.10.5\hdf5-1.10.5\src\H5CX.c", line 1341, in H5CX_get_vec_size
can't get default dataset transfer property list
End of HDF5 error back trace
Problems reading the array data.