苍白不过月芒 2019-08-15 09:47 采纳率: 0%
浏览 7072
已结题

在训练Tensorflow模型(object_detection)时,训练在第一次评估后退出,怎么使训练继续下去?

当我进行ssd模型训练时,训练进行了10分钟,然后进入评估阶段,评估之后程序就自动退出了,没有看到误和警告,这是为什么,怎么让程序一直训练下去?

训练命令:

python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr

配置文件:

training exit after the first evaluation(only one evaluation) in Tensorflow model(object_detection) without error and waring

System information

What is the top-level directory of the model you are using:models/research/object_detection/
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):NO
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows-10(64bit)
TensorFlow installed from (source or binary):conda install tensorflow-gpu
TensorFlow version (use command below):1.13.1
Bazel version (if compiling from source):N/A
CUDA/cuDNN version:cudnn-7.6.0
GPU model and memory:GeForce GTX 1060 6GB
Exact command to reproduce:See below
my command for training :

python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr
This is my config :

train_config {
batch_size: 24
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
optimizer {
rms_prop_optimizer {
learning_rate {
exponential_decay_learning_rate {
initial_learning_rate: 0.00400000018999
decay_steps: 800720
decay_factor: 0.949999988079
}
}
momentum_optimizer_value: 0.899999976158
decay: 0.899999976158
epsilon: 1.0
}
}
fine_tune_checkpoint: "D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"
from_detection_checkpoint: true
num_steps: 200000

train_input_reader {
label_map_path: "D:/gitcode/models/research/object_detection/idol/tf_label_map.pbtxt"
tf_record_input_reader {
input_path: "D:/gitcode/models/research/object_detection/idol/train/Iframe_??????.tfrecord"
}
}
eval_config {
num_examples: 8000
max_evals: 10
use_moving_averages: false
}
eval_input_reader {
label_map_path: "D:/gitcode/models/research/object_detection/idol/tf_label_map.pbtxt"
shuffle: false
num_readers: 1
tf_record_input_reader {
input_path: "D:/gitcode/models/research/object_detection/idol/eval/Iframe_??????.tfrecord"
}

窗口输出:
(default) D:\gitcode\models\research>python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1.
WARNING:tensorflow:Estimator's model_fn () includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py:86: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.experimental.parallel_interleave(...).
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\core\preprocessor.py:196: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
seed2 arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.batch(..., drop_remainder=True).
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\ops\losses\losses_impl.py:448: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\ops\array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-08-14 16:29:31.607841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:04:00.0
totalMemory: 6.00GiB freeMemory: 4.97GiB
2019-08-14 16:29:31.621836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-08-14 16:29:32.275712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-14 16:29:32.283072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-08-14 16:29:32.288675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-08-14 16:29:32.293514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1)
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\eval_util.py:796: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\visualization_utils.py:498: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means tf.py_functions can use accelerators such as GPUs as well as
being differentiable using a gradient tape.

2019-08-14 16:41:44.736212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-08-14 16:41:44.741242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-14 16:41:44.747522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-08-14 16:41:44.751256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-08-14 16:41:44.755548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1)
WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=2.43s).
Accumulating evaluation results...
DONE (t=0.14s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.287
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.529
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.278
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.031
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.312
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.162
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.356
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.356
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.061
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.384
(default) D:\gitcode\models\research>

  • 写回答

4条回答 默认 最新

  • 关注
    评论

报告相同问题?

悬赏问题

  • ¥15 不同尺寸货物如何寻找合适的包装箱型谱
  • ¥15 求解 yolo算法问题
  • ¥15 虚拟机打包apk出现错误
  • ¥30 最小化遗憾贪心算法上界
  • ¥15 用visual studi code完成html页面
  • ¥15 聚类分析或者python进行数据分析
  • ¥15 三菱伺服电机按启动按钮有使能但不动作
  • ¥15 js,页面2返回页面1时定位进入的设备
  • ¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。