TensorFlow SSD训练自己的数据 checkpoint问题 5C

1、参考教程:https://blog.csdn.net/liuyan20062010/article/details/78905517
2、一直到训练成功!
3、导入模型测试,代码

# Restore SSD model.
ckpt_filename = '../train_model/model.ckpt-100'
# ckpt_filename = '../checkpoints/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt'
isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)

4、错误信息:

图片说明

5、经网上查阅,说是修改了模型结构的问题,但是只是按教程上面修改了类别数,且训练成功了。其他均未修改。

2个回答

import os
import math
import random

import numpy as np
import tensorflow as tf
import cv2

slim = tf.contrib.slim
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import sys
sys.path.append('../')
from nets import ssd_vgg_300, ssd_common, np_methods
from preprocessing import ssd_vgg_preprocessing
from notebooks import visualization
# TensorFlow session: grow memory when needed. TF, DO NOT USE ALL MY GPU MEMORY!!!
gpu_options = tf.GPUOptions(allow_growth=True)
config = tf.ConfigProto(log_device_placement=False, gpu_options=gpu_options)
isess = tf.InteractiveSession(config=config)
# Input placeholder.
net_shape = (300, 300)
data_format = 'NHWC'
img_input = tf.placeholder(tf.uint8, shape=(None, None, 3))
# Evaluation pre-processing: resize to SSD net shape.
image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
    img_input, None, None, net_shape, data_format, resize=ssd_vgg_preprocessing.Resize.WARP_RESIZE)
image_4d = tf.expand_dims(image_pre, 0)

# Define the SSD model.
reuse = True if 'ssd_net' in locals() else None
ssd_net = ssd_vgg_300.SSDNet()
with slim.arg_scope(ssd_net.arg_scope(data_format=data_format)):
    predictions, localisations, _, _ = ssd_net.net(image_4d, is_training=False, reuse=reuse)

# Restore SSD model.
ckpt_filename = '../checkpoints/ssd_300_vgg.ckpt'
# ckpt_filename = '../checkpoints/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt'
isess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(isess, ckpt_filename)

# SSD default anchor boxes.
ssd_anchors = ssd_net.anchors(net_shape)


# Main image processing routine.
def process_image(img, select_threshold=0.5, nms_threshold=.45, net_shape=(300, 300)):
    # Run SSD network.
    rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions, localisations, bbox_img],
                                                              feed_dict={img_input: img})

    # Get classes and bboxes from the net outputs.
    rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(
        rpredictions, rlocalisations, ssd_anchors,
        select_threshold=select_threshold, img_shape=net_shape, num_classes=21, decode=True)

    rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
    rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k=400)
    rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold=nms_threshold)
    # Resize bboxes to original image shape. Note: useless for Resize.WARP!
    rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)
    return rclasses, rscores, rbboxes
# Test on some demo image and visualize output.
#测试的文件夹
path = '../demo/'
image_names = sorted(os.listdir(path))
#文件夹中的第几张图,-1代表最后一张
img = mpimg.imread(path + image_names[-1])
rclasses, rscores, rbboxes =  process_image(img)

# visualization.bboxes_draw_on_img(img, rclasses, rscores, rbboxes, visualization.colors_plasma)
visualization.plt_bboxes(img, rclasses, rscores, rbboxes)

weixin_42019831
weixin_42019831 请问可以测试一整个demo所有图片吗?不按照一张一张来
大约一个月之前 回复

请问可以测试一整个demo所有图片吗?不按照一张一张来

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
tensorflow 怎么预训练 微调自己的数据
我在github上找了一个代码:https://github.com/MrCPlusPlus/MobileFaceNet_Tensorflow_Pretrain/blob/master/train_nets.py 我想要尝试训练自己的数据集,我知道好像只要加几行代码就行, 但是我小白不知道怎么加,好像只要把最后一层输出层改下,其它层固定就可以了, 请大家帮忙,谢谢! 我自己瞎折腾报错 :Key Mean_1/avg not found in checkpoint, 不知道为什么?? 最后一层好像是这个: fc7/biases (DT_FLOAT) [85164] fc7/weights (DT_FLOAT) [128,85164] saver = tf.train.Saver(tf.trainable_variables(), max_to_keep=args.saver_maxkeep) # init all variables sess.run(tf.global_variables_initializer()) sess.run(tf.local_variables_initializer()) # load pretrained model if pretrained_model: print('Restoring pretrained model: %s' % pretrained_model) ckpt = tf.train.get_checkpoint_state(pretrained_model) #tf.reset_default_graph() saver.restore(sess, ckpt.model_checkpoint_path) print('Successfully restored!') exclude = ['fc7'] variables_to_restore = slim.get_variables_to_restore(exclude=exclude) init_fn = slim.assign_from_checkpoint_fn('./output/ckpt_best/MobileFaceNet_pretrain.ckpt', variables_to_restore) init_fn(sess) 我加的代码是这样的,但是报错lhs=[1156], rhs=[85164 也就是说原来模型的输出 和自己训练的数据集类数不同, 但是我不是已经排除模型最后一层了吗 ? 我找不到的变量是这种形式的,好像是global variables中的: fc7/weights/ExponentialMovingAverage:0 fc7/biases/ExponentialMovingAverage:0 不同于网络结构中打出来的变量: fc7/biases (DT_FLOAT) [85164] fc7/weights (DT_FLOAT) [128,85164] 还有报错很多变量未初始化,但是我也添加了初始化代码啊 很郁闷. 请大家帮帮忙,谢谢了. 或者可以加我微信沟通:1886 2232 309 ,谢谢! 急 急急!
Tensorflow object detection api 训练自己数据 map一直是 -1
使用Tensorflow object detection api 训练自己的数据 map 一直是-1.loss一直也很低。 结果是这样的: ![图片说明](https://img-ask.csdn.net/upload/201910/14/1571048294_159662.png) loss: ![图片说明](https://img-ask.csdn.net/upload/201910/15/1571105856_282377.jpg) 使用的模型是:model zoo的这个![图片说明](https://img-ask.csdn.net/upload/201910/14/1571047856_17157.jpg) piplineConfig 如下: ``` model { faster_rcnn { num_classes: 25 image_resizer { keep_aspect_ratio_resizer { min_dimension: 720 max_dimension: 1280 } } feature_extractor { type: "faster_rcnn_resnet50" first_stage_features_stride: 16 } first_stage_anchor_generator { grid_anchor_generator { height_stride: 16 width_stride: 16 scales: 0.25 scales: 0.5 scales: 1.0 scales: 2.0 aspect_ratios: 0.5 aspect_ratios: 1.0 aspect_ratios: 2.0 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.00999999977648 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.699999988079 first_stage_max_proposals: 100 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 14 maxpool_kernel_size: 2 maxpool_stride: 2 second_stage_box_predictor { mask_rcnn_box_predictor { fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } use_dropout: false dropout_keep_probability: 1.0 } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.300000011921 iou_threshold: 0.600000023842 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 } } train_config { batch_size: 1 data_augmentation_options { random_horizontal_flip { } } optimizer { momentum_optimizer { learning_rate { manual_step_learning_rate { initial_learning_rate: 0.000300000014249 schedule { step: 900000 learning_rate: 2.99999992421e-05 } schedule { step: 1200000 learning_rate: 3.00000010611e-06 } } } momentum_optimizer_value: 0.899999976158 } use_moving_average: false } gradient_clipping_by_norm: 10.0 fine_tune_checkpoint: "/home/yons/code/自动驾驶视觉综合感知/faster_rcnn_resnet50_coco_2018_01_28/model.ckpt" from_detection_checkpoint: true num_steps: 200000 } train_input_reader { label_map_path: "/home/yons/code/自动驾驶视觉综合感知/pascal_label_map.pbtxt" tf_record_input_reader { input_path: "/home/yons/data/自动驾驶视觉综合感知/train_dataset/tfRecord/train/coco_train.record" } } eval_config { num_examples: 200 max_evals: 10 use_moving_averages: false metrics_set: "coco_detection_metrics" } eval_input_reader { label_map_path: "/home/yons/code/自动驾驶视觉综合感知/pascal_label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "/home/yons/data/自动驾驶视觉综合感知/train_dataset/tfRecord/val/coco_val.record" } } ``` label_map配置: ``` item { id: 1 name: 'red' } item { id: 2 name: 'green' } item { id: 3 name: 'yellow' } item { id: 4 name: 'red_left' } item { id: 5 name: 'red_right' } item { id: 6 name: 'yellow_left' } item { id: 7 name: 'yellow_right' } item { id: 8 name: 'green_left' } item { id: 9 name: 'green_right' } item { id: 10 name: 'red_forward' } item { id: 11 name: 'green_forward' } item { id: 12 name: 'yellow_forward' } item { id:13 name: 'horizon_red' } item { id: 14 name: 'horizon_green' } item { id: 15 name: 'horizon_yellow' } item { id: 16 name: 'off' } item { id: 17 name: 'traffic_sign' } item { id: 18 name: 'car' } item { id: 19 name: 'motor' } item { id: 20 name: 'bike' } item { id: 21 name: 'bus' } item { id: 22 name: 'truck' } item { id: 23 name: 'suv' } item { id: 24 name: 'express' } item { id: 25 name: 'person' } ``` 自己解析数据tfrecord: ![图片说明](https://img-ask.csdn.net/upload/201910/14/1571048123_764545.png) ![图片说明](https://img-ask.csdn.net/upload/201910/14/1571048152_258259.png)
tensorflow载入训练好的模型进行预测,同一张图片预测的结果却不一样????
最近在跑deeplabv1,在测试代码的时候,跑通了训练程序,但是用训练好的模型进行与测试却发现相同的图片预测的结果不一样??请问有大神知道怎么回事吗? 用的是saver.restore()方法载入模型。代码如下: ``` def main(): """Create the model and start the inference process.""" args = get_arguments() # Prepare image. img = tf.image.decode_jpeg(tf.read_file(args.img_path), channels=3) # Convert RGB to BGR. img_r, img_g, img_b = tf.split(value=img, num_or_size_splits=3, axis=2) img = tf.cast(tf.concat(axis=2, values=[img_b, img_g, img_r]), dtype=tf.float32) # Extract mean. img -= IMG_MEAN # Create network. net = DeepLabLFOVModel() # Which variables to load. trainable = tf.trainable_variables() # Predictions. pred = net.preds(tf.expand_dims(img, dim=0)) # Set up TF session and initialize variables. config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config) #init = tf.global_variables_initializer() sess.run(tf.global_variables_initializer()) # Load weights. saver = tf.train.Saver(var_list=trainable) load(saver, sess, args.model_weights) # Perform inference. preds = sess.run([pred]) print(preds) if not os.path.exists(args.save_dir): os.makedirs(args.save_dir) msk = decode_labels(np.array(preds)[0, 0, :, :, 0]) im = Image.fromarray(msk) im.save(args.save_dir + 'mask1.png') print('The output file has been saved to {}'.format( args.save_dir + 'mask.png')) if __name__ == '__main__': main() ``` 其中load是 ``` def load(saver, sess, ckpt_path): '''Load trained weights. Args: saver: TensorFlow saver object. sess: TensorFlow session. ckpt_path: path to checkpoint file with parameters. ''' ckpt = tf.train.get_checkpoint_state(ckpt_path) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) print("Restored model parameters from {}".format(ckpt_path)) ``` DeepLabLFOVMode类如下: ``` class DeepLabLFOVModel(object): """DeepLab-LargeFOV model with atrous convolution and bilinear upsampling. This class implements a multi-layer convolutional neural network for semantic image segmentation task. This is the same as the model described in this paper: https://arxiv.org/abs/1412.7062 - please look there for details. """ def __init__(self, weights_path=None): """Create the model. Args: weights_path: the path to the cpkt file with dictionary of weights from .caffemodel. """ self.variables = self._create_variables(weights_path) def _create_variables(self, weights_path): """Create all variables used by the network. This allows to share them between multiple calls to the loss function. Args: weights_path: the path to the ckpt file with dictionary of weights from .caffemodel. If none, initialise all variables randomly. Returns: A dictionary with all variables. """ var = list() index = 0 if weights_path is not None: with open(weights_path, "rb") as f: weights = cPickle.load(f) # Load pre-trained weights. for name, shape in net_skeleton: var.append(tf.Variable(weights[name], name=name)) del weights else: # Initialise all weights randomly with the Xavier scheme, # and # all biases to 0's. for name, shape in net_skeleton: if "/w" in name: # Weight filter. w = create_variable(name, list(shape)) var.append(w) else: b = create_bias_variable(name, list(shape)) var.append(b) return var def _create_network(self, input_batch, keep_prob): """Construct DeepLab-LargeFOV network. Args: input_batch: batch of pre-processed images. keep_prob: probability of keeping neurons intact. Returns: A downsampled segmentation mask. """ current = input_batch v_idx = 0 # Index variable. # Last block is the classification layer. for b_idx in xrange(len(dilations) - 1): for l_idx, dilation in enumerate(dilations[b_idx]): w = self.variables[v_idx * 2] b = self.variables[v_idx * 2 + 1] if dilation == 1: conv = tf.nn.conv2d(current, w, strides=[ 1, 1, 1, 1], padding='SAME') else: conv = tf.nn.atrous_conv2d( current, w, dilation, padding='SAME') current = tf.nn.relu(tf.nn.bias_add(conv, b)) v_idx += 1 # Optional pooling and dropout after each block. if b_idx < 3: current = tf.nn.max_pool(current, ksize=[1, ks, ks, 1], strides=[1, 2, 2, 1], padding='SAME') elif b_idx == 3: current = tf.nn.max_pool(current, ksize=[1, ks, ks, 1], strides=[1, 1, 1, 1], padding='SAME') elif b_idx == 4: current = tf.nn.max_pool(current, ksize=[1, ks, ks, 1], strides=[1, 1, 1, 1], padding='SAME') current = tf.nn.avg_pool(current, ksize=[1, ks, ks, 1], strides=[1, 1, 1, 1], padding='SAME') elif b_idx <= 6: current = tf.nn.dropout(current, keep_prob=keep_prob) # Classification layer; no ReLU. # w = self.variables[v_idx * 2] w = create_variable(name='w', shape=[1, 1, 1024, n_classes]) # b = self.variables[v_idx * 2 + 1] b = create_bias_variable(name='b', shape=[n_classes]) conv = tf.nn.conv2d(current, w, strides=[1, 1, 1, 1], padding='SAME') current = tf.nn.bias_add(conv, b) return current def prepare_label(self, input_batch, new_size): """Resize masks and perform one-hot encoding. Args: input_batch: input tensor of shape [batch_size H W 1]. new_size: a tensor with new height and width. Returns: Outputs a tensor of shape [batch_size h w 18] with last dimension comprised of 0's and 1's only. """ with tf.name_scope('label_encode'): # As labels are integer numbers, need to use NN interp. input_batch = tf.image.resize_nearest_neighbor( input_batch, new_size) # Reducing the channel dimension. input_batch = tf.squeeze(input_batch, squeeze_dims=[3]) input_batch = tf.one_hot(input_batch, depth=n_classes) return input_batch def preds(self, input_batch): """Create the network and run inference on the input batch. Args: input_batch: batch of pre-processed images. Returns: Argmax over the predictions of the network of the same shape as the input. """ raw_output = self._create_network( tf.cast(input_batch, tf.float32), keep_prob=tf.constant(1.0)) raw_output = tf.image.resize_bilinear( raw_output, tf.shape(input_batch)[1:3, ]) raw_output = tf.argmax(raw_output, dimension=3) raw_output = tf.expand_dims(raw_output, dim=3) # Create 4D-tensor. return tf.cast(raw_output, tf.uint8) def loss(self, img_batch, label_batch): """Create the network, run inference on the input batch and compute loss. Args: input_batch: batch of pre-processed images. Returns: Pixel-wise softmax loss. """ raw_output = self._create_network( tf.cast(img_batch, tf.float32), keep_prob=tf.constant(0.5)) prediction = tf.reshape(raw_output, [-1, n_classes]) # Need to resize labels and convert using one-hot encoding. label_batch = self.prepare_label( label_batch, tf.stack(raw_output.get_shape()[1:3])) gt = tf.reshape(label_batch, [-1, n_classes]) # Pixel-wise softmax loss. loss = tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=gt) reduced_loss = tf.reduce_mean(loss) return reduced_loss ``` 按理说载入模型应该没有问题,可是不知道为什么结果却不一样? 图片:![图片说明](https://img-ask.csdn.net/upload/201911/15/1573810836_83106.jpg) ![图片说明](https://img-ask.csdn.net/upload/201911/15/1573810850_924663.png) 预测的结果: ![图片说明](https://img-ask.csdn.net/upload/201911/15/1573810884_985680.png) ![图片说明](https://img-ask.csdn.net/upload/201911/15/1573810904_577649.png) 两次结果不一样,与保存的模型算出来的结果也不一样。 我用的是GitHub上这个人的代码: https://github.com/minar09/DeepLab-LFOV-TensorFlow 急急急,请问有大神知道吗???
在训练Tensorflow模型(object_detection)时,训练在第一次评估后退出,怎么使训练继续下去?
当我进行ssd模型训练时,训练进行了10分钟,然后进入评估阶段,评估之后程序就自动退出了,没有看到误和警告,这是为什么,怎么让程序一直训练下去? 训练命令: ``` python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr ``` 配置文件: ``` training exit after the first evaluation(only one evaluation) in Tensorflow model(object_detection) without error and waring System information What is the top-level directory of the model you are using:models/research/object_detection/ Have I written custom code (as opposed to using a stock example script provided in TensorFlow):NO OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows-10(64bit) TensorFlow installed from (source or binary):conda install tensorflow-gpu TensorFlow version (use command below):1.13.1 Bazel version (if compiling from source):N/A CUDA/cuDNN version:cudnn-7.6.0 GPU model and memory:GeForce GTX 1060 6GB Exact command to reproduce:See below my command for training : python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr This is my config : train_config { batch_size: 24 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } optimizer { rms_prop_optimizer { learning_rate { exponential_decay_learning_rate { initial_learning_rate: 0.00400000018999 decay_steps: 800720 decay_factor: 0.949999988079 } } momentum_optimizer_value: 0.899999976158 decay: 0.899999976158 epsilon: 1.0 } } fine_tune_checkpoint: "D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt" from_detection_checkpoint: true num_steps: 200000 train_input_reader { label_map_path: "D:/gitcode/models/research/object_detection/idol/tf_label_map.pbtxt" tf_record_input_reader { input_path: "D:/gitcode/models/research/object_detection/idol/train/Iframe_??????.tfrecord" } } eval_config { num_examples: 8000 max_evals: 10 use_moving_averages: false } eval_input_reader { label_map_path: "D:/gitcode/models/research/object_detection/idol/tf_label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "D:/gitcode/models/research/object_detection/idol/eval/Iframe_??????.tfrecord" } ``` 窗口输出: (default) D:\gitcode\models\research>python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x0000027CBAB7BB70>) includes params argument, but params are not passed to Estimator. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py:86: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.experimental.parallel_interleave(...). WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\core\preprocessor.py:196: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version. Instructions for updating: seed2 arg is deprecated.Use sample_distorted_bounding_box_v2 instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\ops\losses\losses_impl.py:448: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\ops\array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2019-08-14 16:29:31.607841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:04:00.0 totalMemory: 6.00GiB freeMemory: 4.97GiB 2019-08-14 16:29:31.621836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-08-14 16:29:32.275712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-14 16:29:32.283072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-08-14 16:29:32.288675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-08-14 16:29:32.293514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\eval_util.py:796: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\visualization_utils.py:498: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, use tf.py_function, which takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means tf.py_functions can use accelerators such as GPUs as well as being differentiable using a gradient tape. 2019-08-14 16:41:44.736212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-08-14 16:41:44.741242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-14 16:41:44.747522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-08-14 16:41:44.751256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-08-14 16:41:44.755548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. creating index... index created! creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=2.43s). Accumulating evaluation results... DONE (t=0.14s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.287 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.529 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.278 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.031 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.312 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.162 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.356 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.356 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.061 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.384 (default) D:\gitcode\models\research>
用tensorflow做机器翻译时训练代码有问题
``` # -*- coding:UTF-8 -*- import tensorflow as tf src_path = 'D:/Python37/untitled1/train.tags.en-zh.en.deletehtml' trg_path = 'D:/Python37/untitled1/train.tags.en-zh.zh.deletehtml' SRC_TRAIN_DATA = 'D:/Python37/untitled1/train.tags.en-zh.en.deletehtml.segment' # 源语言输入文件 TRG_TRAIN_DATA = 'D:/Python37/untitled1/train.tags.en-zh.zh.deletehtml.segment' # 目标语言输入文件 CHECKPOINT_PATH = './model/seq2seq_ckpt' # checkpoint保存路径 HIDDEN_SIZE = 1024 # LSTM的隐藏层规模 NUM_LAYERS = 2 # 深层循环神经网络中LSTM结构的层数 SRC_VOCAB_SIZE = 10000 # 源语言词汇表大小 TRG_VOCAB_SIZE = 4000 # 目标语言词汇表大小 BATCH_SIZE = 100 # 训练数据batch的大小 NUM_EPOCH = 5 # 使用训练数据的轮数 KEEP_PROB = 0.8 # 节点不被dropout的概率 MAX_GRAD_NORM = 5 # 用于控制梯度膨胀的梯度大小上限 SHARE_EMB_AND_SOFTMAX = True # 在softmax层和词向量层之间共享参数 MAX_LEN = 50 # 限定句子的最大单词数量 SOS_ID = 1 # 目标语言词汇表中<sos>的ID """ function: 数据batching,产生最后输入数据格式 Parameters: file_path-数据路径 Returns: dataset- 每个句子-对应的长度组成的TextLineDataset类的数据集对应的张量 """ def MakeDataset(file_path): dataset = tf.data.TextLineDataset(file_path) # map(function, sequence[, sequence, ...]) -> list # 通过定义可以看到,这个函数的第一个参数是一个函数,剩下的参数是一个或多个序列,返回值是一个集合。 # function可以理解为是一个一对一或多对一函数,map的作用是以参数序列中的每一个元素调用function函数,返回包含每次function函数返回值的list。 # lambda argument_list: expression # 其中lambda是Python预留的关键字,argument_list和expression由用户自定义 # argument_list参数列表, expression 为函数表达式 # 根据空格将单词编号切分开并放入一个一维向量 dataset = dataset.map(lambda string: tf.string_split([string]).values) # 将字符串形式的单词编号转化为整数 dataset = dataset.map(lambda string: tf.string_to_number(string, tf.int32)) # 统计每个句子的单词数量,并与句子内容一起放入Dataset dataset = dataset.map(lambda x: (x, tf.size(x))) return dataset """ function: 从源语言文件src_path和目标语言文件trg_path中分别读取数据,并进行填充和batching操作 Parameters: src_path-源语言,即被翻译的语言,英语. trg_path-目标语言,翻译之后的语言,汉语. batch_size-batch的大小 Returns: dataset- 每个句子-对应的长度 组成的TextLineDataset类的数据集 """ def MakeSrcTrgDataset(src_path, trg_path, batch_size): # 首先分别读取源语言数据和目标语言数据 src_data = MakeDataset(src_path) trg_data = MakeDataset(trg_path) # 通过zip操作将两个Dataset合并为一个Dataset,现在每个Dataset中每一项数据ds由4个张量组成 # ds[0][0]是源句子 # ds[0][1]是源句子长度 # ds[1][0]是目标句子 # ds[1][1]是目标句子长度 #https://blog.csdn.net/qq_32458499/article/details/78856530这篇博客看一下可以细致了解一下Dataset这个库,以及.map和.zip的用法 dataset = tf.data.Dataset.zip((src_data, trg_data)) # 删除内容为空(只包含<eos>)的句子和长度过长的句子 def FilterLength(src_tuple, trg_tuple): ((src_input, src_len), (trg_label, trg_len)) = (src_tuple, trg_tuple) # tf.logical_and 相当于集合中的and做法,后面两个都为true最终结果才会为true,否则为false # tf.greater Returns the truth value of (x > y),所以以下所说的是句子长度必须得大于一也就是不能为空的句子 # tf.less_equal Returns the truth value of (x <= y),所以所说的是长度要小于最长长度 src_len_ok = tf.logical_and(tf.greater(src_len, 1), tf.less_equal(src_len, MAX_LEN)) trg_len_ok = tf.logical_and(tf.greater(trg_len, 1), tf.less_equal(trg_len, MAX_LEN)) return tf.logical_and(src_len_ok, trg_len_ok) #两个都满足才返回true # filter接收一个函数Func并将该函数作用于dataset的每个元素,根据返回值True或False保留或丢弃该元素,True保留该元素,False丢弃该元素 # 最后得到的就是去掉空句子和过长的句子的数据集 dataset = dataset.filter(FilterLength) # 解码器需要两种格式的目标句子: # 1.解码器的输入(trg_input), 形式如同'<sos> X Y Z' # 2.解码器的目标输出(trg_label), 形式如同'X Y Z <eos>' # 上面从文件中读到的目标句子是'X Y Z <eos>'的形式,我们需要从中生成'<sos> X Y Z'形式并加入到Dataset # 编码器只有输入,没有输出,而解码器有输入也有输出,输入为<sos>+(除去最后一位eos的label列表) # 例如train.en最后都为2,id为2就是eos def MakeTrgInput(src_tuple, trg_tuple): ((src_input, src_len), (trg_label, trg_len)) = (src_tuple, trg_tuple) # tf.concat用法 https://blog.csdn.net/qq_33431368/article/details/79429295 trg_input = tf.concat([[SOS_ID], trg_label[:-1]], axis=0) return ((src_input, src_len), (trg_input, trg_label, trg_len)) dataset = dataset.map(MakeTrgInput) # 随机打乱训练数据 dataset = dataset.shuffle(10000) # 规定填充后的输出的数据维度 padded_shapes = ( (tf.TensorShape([None]), # 源句子是长度未知的向量 tf.TensorShape([])), # 源句子长度是单个数字 (tf.TensorShape([None]), # 目标句子(解码器输入)是长度未知的向量 tf.TensorShape([None]), # 目标句子(解码器目标输出)是长度未知的向量 tf.TensorShape([])) # 目标句子长度(输出)是单个数字 ) # 调用padded_batch方法进行padding 和 batching操作 batched_dataset = dataset.padded_batch(batch_size, padded_shapes) return batched_dataset """ function: seq2seq模型 Parameters: Returns: """ class NMTModel(object): """ function: 模型初始化 Parameters: Returns: """ def __init__(self): # 定义编码器和解码器所使用的LSTM结构 self.enc_cell = tf.nn.rnn_cell.MultiRNNCell( [tf.nn.rnn_cell.LSTMCell(HIDDEN_SIZE) for _ in range(NUM_LAYERS)]) self.dec_cell = tf.nn.rnn_cell.MultiRNNCell( [tf.nn.rnn_cell.LSTMCell(HIDDEN_SIZE) for _ in range(NUM_LAYERS)]) # 为源语言和目标语言分别定义词向量 self.src_embedding = tf.get_variable('src_emb', [SRC_VOCAB_SIZE, HIDDEN_SIZE]) self.trg_embedding = tf.get_variable('trg_emb', [TRG_VOCAB_SIZE, HIDDEN_SIZE]) # 定义softmax层的变量 if SHARE_EMB_AND_SOFTMAX: self.softmax_weight = tf.transpose(self.trg_embedding) else: self.softmax_weight = tf.get_variable('weight', [HIDDEN_SIZE, TRG_VOCAB_SIZE]) self.softmax_bias = tf.get_variable('softmax_loss', [TRG_VOCAB_SIZE]) """ function: 在forward函数中定义模型的前向计算图 Parameters:   MakeSrcTrgDataset函数产生的五种张量如下(全部为张量) src_input: 编码器输入(源数据) src_size : 输入大小 trg_input:解码器输入(目标数据) trg_label:解码器输出(目标数据) trg_size: 输出大小 Returns: """ def forward(self, src_input, src_size, trg_input, trg_label, trg_size): batch_size = tf.shape(src_input)[0] # 将输入和输出单词转为词向量(rnn中输入数据都要转换成词向量) # 相当于input中的每个id对应的embedding中的向量转换 src_emb = tf.nn.embedding_lookup(self.src_embedding, src_input) trg_emb = tf.nn.embedding_lookup(self.trg_embedding, trg_input) # 在词向量上进行dropout src_emb = tf.nn.dropout(src_emb, KEEP_PROB) trg_emb = tf.nn.dropout(trg_emb, KEEP_PROB) # 使用dynamic_rnn构造编码器 # 编码器读取源句子每个位置的词向量,输出最后一步的隐藏状态enc_state # 因为编码器是一个双层LSTM,因此enc_state是一个包含两个LSTMStateTuple类的tuple, # 每个LSTMStateTuple对应编码器中一层的状态 # enc_outputs是顶层LSTM在每一步的输出,它的维度是[batch_size, max_time, HIDDEN_SIZE] # seq2seq模型中不需要用到enc_outputs,而attention模型会用到它 with tf.variable_scope('encoder'): enc_outputs, enc_state = tf.nn.dynamic_rnn(self.enc_cell, src_emb, src_size, dtype=tf.float32) # 使用dynamic_rnn构造解码器 # 解码器读取目标句子每个位置的词向量,输出的dec_outputs为每一步顶层LSTM的输出 # dec_outputs的维度是[batch_size, max_time, HIDDEN_SIZE] # initial_state=enc_state表示用编码器的输出来初始化第一步的隐藏状态 # 编码器最后编码结束最后的状态为解码器初始化的状态 with tf.variable_scope('decoder'): dec_outputs, _ = tf.nn.dynamic_rnn(self.dec_cell, trg_emb, trg_size, initial_state=enc_state) # 计算解码器每一步的log perplexity # 输出重新转换成shape为[,HIDDEN_SIZE] output = tf.reshape(dec_outputs, [-1, HIDDEN_SIZE]) # 计算解码器每一步的softmax概率值 logits = tf.matmul(output, self.softmax_weight) + self.softmax_bias # 交叉熵损失函数,算loss loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.reshape(trg_label, [-1]), logits=logits) # 在计算平均损失时,需要将填充位置的权重设置为0,以避免无效位置的预测干扰模型的训练 label_weights = tf.sequence_mask(trg_size, maxlen=tf.shape(trg_label)[1], dtype=tf.float32) label_weights = tf.reshape(label_weights, [-1]) cost = tf.reduce_sum(loss * label_weights) cost_per_token = cost / tf.reduce_sum(label_weights) # 定义反向传播操作 trainable_variables = tf.trainable_variables() # 控制梯度大小,定义优化方法和训练步骤 # 算出每个需要更新的值的梯度,并对其进行控制 grads = tf.gradients(cost / tf.to_float(batch_size), trainable_variables) grads, _ = tf.clip_by_global_norm(grads, MAX_GRAD_NORM) # 利用梯度下降优化算法进行优化.学习率为1.0 optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0) # 相当于minimize的第二步,正常来讲所得到的list[grads,vars]由compute_gradients得到,返回的是执行对应变量的更新梯度操作的op train_op = optimizer.apply_gradients(zip(grads, trainable_variables)) return cost_per_token, train_op """ function: 使用给定的模型model上训练一个epoch,并返回全局步数,每训练200步便保存一个checkpoint Parameters: session : 会议 cost_op : 计算loss的操作op train_op: 训练的操作op saver:  保存model的类 step:   训练步数 Returns: """ def run_epoch(session, cost_op, train_op, saver, step): # 训练一个epoch # 重复训练步骤直至遍历完Dataset中所有数据 while True: try: # 运行train_op并计算cost_op的结果也就是损失值,训练数据在main()函数中以Dataset方式提供 cost, _ = session.run([cost_op, train_op]) # 步数为10的倍数进行打印 if step % 10 == 0: print('After %d steps, per token cost is %.3f' % (step, cost)) # 每200步保存一个checkpoint if step % 200 == 0: saver.save(session, CHECKPOINT_PATH, global_step=step) step += 1 except tf.errors.OutOfRangeError: break return step """ function: 主函数 Parameters: Returns: """ def main(): # 定义初始化函数 initializer = tf.random_uniform_initializer(-0.05, 0.05) # 定义训练用的循环神经网络模型 with tf.variable_scope('nmt_model', reuse=None, initializer=initializer): train_model = NMTModel() # 定义输入数据 data = MakeSrcTrgDataset(SRC_TRAIN_DATA, TRG_TRAIN_DATA, BATCH_SIZE) iterator = data.make_initializable_iterator() (src, src_size), (trg_input, trg_label, trg_size) = iterator.get_next() # 定义前向计算图,输入数据以张量形式提供给forward函数 cost_op, train_op = train_model.forward(src, src_size, trg_input, trg_label, trg_size) # 训练模型 # 保存模型 saver = tf.train.Saver() step = 0 with tf.Session() as sess: # 初始化全部变量 tf.global_variables_initializer().run() # 进行NUM_EPOCH轮数 for i in range(NUM_EPOCH): print('In iteration: %d' % (i + 1)) sess.run(iterator.initializer) step = run_epoch(sess, cost_op, train_op, saver, step) if __name__ == '__main__': main() ``` 问题如下,不知道怎么解决,谢谢! Traceback (most recent call last): File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: StringToNumberOp could not correctly convert string: This [[{{node StringToNumber}}]] [[{{node IteratorGetNext}}]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:/Python37/untitled1/train_model.py", line 277, in <module> main() File "D:/Python37/untitled1/train_model.py", line 273, in main step = run_epoch(sess, cost_op, train_op, saver, step) File "D:/Python37/untitled1/train_model.py", line 231, in run_epoch cost, _ = session.run([cost_op, train_op]) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: StringToNumberOp could not correctly convert string: This [[{{node StringToNumber}}]] [[node IteratorGetNext (defined at D:/Python37/untitled1/train_model.py:259) ]]
tensorflow训练完模型直接测试和导入模型进行测试的结果不同,一个很好,一个略差,这是为什么?
在tensorflow训练完模型,我直接采用同一个session进行测试,得到结果较好,但是采用训练完保存的模型,进行重新载入进行测试,结果较差,不懂是为什么会出现这样的结果。注:测试数据是一样的。以下是模型结果: 训练集:loss:0.384,acc:0.931. 验证集:loss:0.212,acc:0.968. 训练完在同一session内的测试集:acc:0.96。导入保存的模型进行测试:acc:0.29 ``` def create_model(hps): global_step = tf.Variable(tf.zeros([], tf.float64), name = 'global_step', trainable = False) scale = 1.0 / math.sqrt(hps.num_embedding_size + hps.num_lstm_nodes[-1]) / 3.0 print(type(scale)) gru_init = tf.random_normal_initializer(-scale, scale) with tf.variable_scope('Bi_GRU_nn', initializer = gru_init): for i in range(hps.num_lstm_layers): cell_bw = tf.contrib.rnn.GRUCell(hps.num_lstm_nodes[i], activation = tf.nn.relu, name = 'cell-bw') cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, output_keep_prob = dropout_keep_prob) cell_fw = tf.contrib.rnn.GRUCell(hps.num_lstm_nodes[i], activation = tf.nn.relu, name = 'cell-fw') cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, output_keep_prob = dropout_keep_prob) rnn_outputs, _ = tf.nn.bidirectional_dynamic_rnn(cell_bw, cell_fw, inputs, dtype=tf.float32) embeddedWords = tf.concat(rnn_outputs, 2) finalOutput = embeddedWords[:, -1, :] outputSize = hps.num_lstm_nodes[-1] * 2 # 因为是双向LSTM,最终的输出值是fw和bw的拼接,因此要乘以2 last = tf.reshape(finalOutput, [-1, outputSize]) # reshape成全连接层的输入维度 last = tf.layers.batch_normalization(last, training = is_training) fc_init = tf.uniform_unit_scaling_initializer(factor = 1.0) with tf.variable_scope('fc', initializer = fc_init): fc1 = tf.layers.dense(last, hps.num_fc_nodes, name = 'fc1') fc1_batch_normalization = tf.layers.batch_normalization(fc1, training = is_training) fc_activation = tf.nn.relu(fc1_batch_normalization) logits = tf.layers.dense(fc_activation, hps.num_classes, name = 'fc2') with tf.name_scope('metrics'): softmax_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits = logits, labels = tf.argmax(outputs, 1)) loss = tf.reduce_mean(softmax_loss) # [0, 1, 5, 4, 2] ->argmax:2 因为在第二个位置上是最大的 y_pred = tf.argmax(tf.nn.softmax(logits), 1, output_type = tf.int64, name = 'y_pred') # 计算准确率,看看算对多少个 correct_pred = tf.equal(tf.argmax(outputs, 1), y_pred) # tf.cast 将数据转换成 tf.float32 类型 accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) with tf.name_scope('train_op'): tvar = tf.trainable_variables() for var in tvar: print('variable name: %s' % (var.name)) grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvar), hps.clip_lstm_grads) optimizer = tf.train.AdamOptimizer(hps.learning_rate) train_op = optimizer.apply_gradients(zip(grads, tvar), global_step) # return((inputs, outputs, is_training), (loss, accuracy, y_pred), (train_op, global_step)) return((inputs, outputs), (loss, accuracy, y_pred), (train_op, global_step)) placeholders, metrics, others = create_model(hps) content, labels = placeholders loss, accuracy, y_pred = metrics train_op, global_step = others def val_steps(sess, x_batch, y_batch, writer = None): loss_val, accuracy_val = sess.run([loss,accuracy], feed_dict = {inputs: x_batch, outputs: y_batch, is_training: hps.val_is_training, dropout_keep_prob: 1.0}) return loss_val, accuracy_val loss_summary = tf.summary.scalar('loss', loss) accuracy_summary = tf.summary.scalar('accuracy', accuracy) # 将所有的变量都集合起来 merged_summary = tf.summary.merge_all() # 用于test测试的summary merged_summary_test = tf.summary.merge([loss_summary, accuracy_summary]) LOG_DIR = '.' run_label = 'run_Bi-GRU_Dropout_tensorboard' run_dir = os.path.join(LOG_DIR, run_label) if not os.path.exists(run_dir): os.makedirs(run_dir) train_log_dir = os.path.join(run_dir, timestamp, 'train') test_los_dir = os.path.join(run_dir, timestamp, 'test') if not os.path.exists(train_log_dir): os.makedirs(train_log_dir) if not os.path.join(test_los_dir): os.makedirs(test_los_dir) # saver得到的文件句柄,可以将文件训练的快照保存到文件夹中去 saver = tf.train.Saver(tf.global_variables(), max_to_keep = 5) # train 代码 init_op = tf.global_variables_initializer() train_keep_prob_value = 0.2 test_keep_prob_value = 1.0 # 由于如果按照每一步都去计算的话,会很慢,所以我们规定每100次存储一次 output_summary_every_steps = 100 num_train_steps = 1000 # 每隔多少次保存一次 output_model_every_steps = 500 # 测试集测试 test_model_all_steps = 4000 i = 0 session_conf = tf.ConfigProto( gpu_options = tf.GPUOptions(allow_growth=True), allow_soft_placement = True, log_device_placement = False) with tf.Session(config = session_conf) as sess: sess.run(init_op) # 将训练过程中,将loss,accuracy写入文件里,后面是目录和计算图,如果想要在tensorboard中显示计算图,就想sess.graph加上 train_writer = tf.summary.FileWriter(train_log_dir, sess.graph) # 同样将测试的结果保存到tensorboard中,没有计算图 test_writer = tf.summary.FileWriter(test_los_dir) batches = batch_iter(list(zip(x_train, y_train)), hps.batch_size, hps.num_epochs) for batch in batches: train_x, train_y = zip(*batch) eval_ops = [loss, accuracy, train_op, global_step] should_out_summary = ((i + 1) % output_summary_every_steps == 0) if should_out_summary: eval_ops.append(merged_summary) # 那三个占位符输进去 # 计算loss, accuracy, train_op, global_step的图 eval_ops.append(merged_summary) outputs_train = sess.run(eval_ops, feed_dict={ inputs: train_x, outputs: train_y, dropout_keep_prob: train_keep_prob_value, is_training: hps.train_is_training }) loss_train, accuracy_train = outputs_train[0:2] if should_out_summary: # 由于我们想在100steps之后计算summary,所以上面 should_out_summary = ((i + 1) % output_summary_every_steps == 0)成立, # 即为真True,那么我们将训练的内容放入eval_ops的最后面了,因此,我们想获得summary的结果得在eval_ops_results的最后一个 train_summary_str = outputs_train[-1] # 将获得的结果写训练tensorboard文件夹中,由于训练从0开始,所以这里加上1,表示第几步的训练 train_writer.add_summary(train_summary_str, i + 1) test_summary_str = sess.run([merged_summary_test], feed_dict = {inputs: x_dev, outputs: y_dev, dropout_keep_prob: 1.0, is_training: hps.val_is_training })[0] test_writer.add_summary(test_summary_str, i + 1) current_step = tf.train.global_step(sess, global_step) if (i + 1) % 100 == 0: print("Step: %5d, loss: %3.3f, accuracy: %3.3f" % (i + 1, loss_train, accuracy_train)) # 500个batch校验一次 if (i + 1) % 500 == 0: loss_eval, accuracy_eval = val_steps(sess, x_dev, y_dev) print("Step: %5d, val_loss: %3.3f, val_accuracy: %3.3f" % (i + 1, loss_eval, accuracy_eval)) if (i + 1) % output_model_every_steps == 0: path = saver.save(sess,os.path.join(out_dir, 'ckp-%05d' % (i + 1))) print("Saved model checkpoint to {}\n".format(path)) print('model saved to ckp-%05d' % (i + 1)) if (i + 1) % test_model_all_steps == 0: # test_loss, test_acc, all_predictions= sess.run([loss, accuracy, y_pred], feed_dict = {inputs: x_test, outputs: y_test, dropout_keep_prob: 1.0}) test_loss, test_acc, all_predictions= sess.run([loss, accuracy, y_pred], feed_dict = {inputs: x_test, outputs: y_test, is_training: hps.val_is_training, dropout_keep_prob: 1.0}) print("test_loss: %3.3f, test_acc: %3.3d" % (test_loss, test_acc)) batches = batch_iter(list(x_test), 128, 1, shuffle=False) # Collect the predictions here all_predictions = [] for x_test_batch in batches: batch_predictions = sess.run(y_pred, {inputs: x_test_batch, is_training: hps.val_is_training, dropout_keep_prob: 1.0}) all_predictions = np.concatenate([all_predictions, batch_predictions]) correct_predictions = float(sum(all_predictions == y.flatten())) print("Total number of test examples: {}".format(len(y_test))) print("Accuracy: {:g}".format(correct_predictions/float(len(y_test)))) test_y = y_test.argmax(axis = 1) #生成混淆矩阵 conf_mat = confusion_matrix(test_y, all_predictions) fig, ax = plt.subplots(figsize = (4,2)) sns.heatmap(conf_mat, annot=True, fmt = 'd', xticklabels = cat_id_df.category_id.values, yticklabels = cat_id_df.category_id.values) font_set = FontProperties(fname = r"/usr/share/fonts/truetype/wqy/wqy-microhei.ttc", size=15) plt.ylabel(u'实际结果',fontsize = 18,fontproperties = font_set) plt.xlabel(u'预测结果',fontsize = 18,fontproperties = font_set) plt.savefig('./test.png') print('accuracy %s' % accuracy_score(all_predictions, test_y)) print(classification_report(test_y, all_predictions,target_names = cat_id_df['category_name'].values)) print(classification_report(test_y, all_predictions)) i += 1 ``` 以上的模型代码,请求各位大神帮我看看,为什么出现这样的结果?
数据库恢复中关于 redo checkpoint 的提问
小弟最近在学习数据库,对数据库recovery的两个个知识点理解不是很清楚,就从下面的这个题开始提问吧。 ![图片说明](https://img-ask.csdn.net/upload/201911/22/1574405094_616132.png) 我们课件关于redo的执行规则如下: ![图片说明](https://img-ask.csdn.net/upload/201911/22/1574405213_566290.png) 我的疑问是:find last <checkpoint L> record 找到最后一个 checkpoint的记录是啥意思,应该怎么理解?意思是如果有多个checkpoint记录,只找最近的一个吗? 另外一个疑问是 scan forward from above 《checkpoint L》 record这个above 该怎么理解。 该题的答案如下: ![图片说明](https://img-ask.csdn.net/upload/201911/22/1574405392_974909.png) 那么我找到checkpoint过后应该从 《T5,C,300,100》开始搜索还是从 《T4 start》开始搜索 谢谢大佬的解答!
tensorflow重载模型继续训练得到的loss比原模型继续训练得到的loss大,是什么原因??
我使用tensorflow训练了一个模型,在第10个epoch时保存模型,然后在一个新的文件里重载模型继续训练,结果我发现重载的模型在第一个epoch的loss比原模型在epoch=11的loss要大,我感觉既然是重载了原模型,那么重载模型训练的第一个epoch应该是和原模型训练的第11个epoch相等的,一直找不到问题或者自己逻辑的问题,希望大佬能指点迷津。源代码和重载模型的代码如下: ``` 原代码: from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf import os import numpy as np os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' mnist = input_data.read_data_sets("./",one_hot=True) tf.reset_default_graph() ###定义数据和标签 n_inputs = 784 n_classes = 10 X = tf.placeholder(tf.float32,[None,n_inputs],name='X') Y = tf.placeholder(tf.float32,[None,n_classes],name='Y') ###定义网络结构 n_hidden_1 = 256 n_hidden_2 = 256 layer_1 = tf.layers.dense(inputs=X,units=n_hidden_1,activation=tf.nn.relu,kernel_regularizer=tf.contrib.layers.l2_regularizer(0.01)) layer_2 = tf.layers.dense(inputs=layer_1,units=n_hidden_2,activation=tf.nn.relu,kernel_regularizer=tf.contrib.layers.l2_regularizer(0.01)) outputs = tf.layers.dense(inputs=layer_2,units=n_classes,name='outputs') pred = tf.argmax(tf.nn.softmax(outputs,axis=1),axis=1) print(pred.name) err = tf.count_nonzero((pred - tf.argmax(Y,axis=1))) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=outputs,labels=Y),name='cost') print(cost.name) ###定义优化器 learning_rate = 0.001 optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost,name='OP') saver = tf.train.Saver() checkpoint = 'softmax_model/dense_model.cpkt' ###训练 batch_size = 100 training_epochs = 11 with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for epoch in range(training_epochs): batch_num = int(mnist.train.num_examples / batch_size) epoch_cost = 0 sumerr = 0 for i in range(batch_num): batch_x,batch_y = mnist.train.next_batch(batch_size) c,e = sess.run([cost,err],feed_dict={X:batch_x,Y:batch_y}) _ = sess.run(optimizer,feed_dict={X:batch_x,Y:batch_y}) epoch_cost += c / batch_num sumerr += e / mnist.train.num_examples if epoch == (training_epochs - 1): print('batch_cost = ',c) if epoch == (training_epochs - 2): saver.save(sess, checkpoint) print('test_error = ',sess.run(cost, feed_dict={X: mnist.test.images, Y: mnist.test.labels})) ``` ``` 重载模型的代码: from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' mnist = input_data.read_data_sets("./",one_hot=True) #one_hot=True指对样本标签进行独热编码 file_path = 'softmax_model/dense_model.cpkt' saver = tf.train.import_meta_graph(file_path + '.meta') graph = tf.get_default_graph() X = graph.get_tensor_by_name('X:0') Y = graph.get_tensor_by_name('Y:0') cost = graph.get_operation_by_name('cost').outputs[0] train_op = graph.get_operation_by_name('OP') training_epoch = 10 learning_rate = 0.001 batch_size = 100 with tf.Session() as sess: saver.restore(sess,file_path) print('test_cost = ',sess.run(cost, feed_dict={X: mnist.test.images, Y: mnist.test.labels})) for epoch in range(training_epoch): batch_num = int(mnist.train.num_examples / batch_size) epoch_cost = 0 for i in range(batch_num): batch_x, batch_y = mnist.train.next_batch(batch_size) c = sess.run(cost, feed_dict={X: batch_x, Y: batch_y}) _ = sess.run(train_op, feed_dict={X: batch_x, Y: batch_y}) epoch_cost += c / batch_num print(epoch_cost) ``` 值得注意的是,我在原模型和重载模型里都计算了测试集的cost,两者的结果是一致的。说明参数载入应该是对的
神经网络模型加载后测试效果不对
tensorflow框架训练好的神经网络模型,加载之后再去测试准确率特别低 图中是我的加载方法 麻烦大神帮忙指正,是不是网络加载出现问题 首先手动重新构建了模型:以下代码省略了权值、偏置和网络搭建 ``` # 构建模型 pred = alex_net(x, weights, biases, keep_prob) # 定义损失函数和优化器 cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y,logits=pred))#softmax和交叉熵结合 optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) # 评估函数 correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # 3.训练模型和评估模型 # 初始化变量 init = tf.global_variables_initializer() saver = tf.train.Saver() with tf.Session() as sess: # 初始化变量 sess.run(init) saver.restore(sess, tf.train.latest_checkpoint(model_dir)) pred_test=sess.run(pred,{x:test_x, keep_prob:1.0}) result=sess.run(tf.argmax(pred_test, 1)) ```
flume使用lzo压缩问题
目前使用flume抽取日志数据使用flume拦截器将日志数据发送到不同的 kafka的topic中,然后使用flume将kafka的topic中的数据使用LZO压缩 发送到hdfs中,在lzo压缩这里flume出现了问题,报错信息如下: ``` 2020-01-30 19:38:12,842 (conf-file-poller-0) [WARN - org.apache.hadoop.util.NativeCodeLoader.<clinit>(NativeCodeLoader.java:62)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-01-30 19:38:13,294 (conf-file-poller-0) [ERROR - org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:426)] Sink k1 has been removed due to an error during configuration java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139) at org.apache.flume.sink.hdfs.HDFSEventSink.getCodec(HDFSEventSink.java:313) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:237) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:411) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) ... 13 more 2020-01-30 19:38:13,297 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: k2, type: hdfs 2020-01-30 19:38:13,356 (conf-file-poller-0) [ERROR - org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:426)] Sink k2 has been removed due to an error during configuration java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139) at org.apache.flume.sink.hdfs.HDFSEventSink.getCodec(HDFSEventSink.java:313) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:237) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:411) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found ``` kafka配置文件如下: ``` ## 组件 a1.sources=r1 r2 a1.channels=c1 c2 a1.sinks=k1 k2 ## source1 a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource a1.sources.r1.batchSize = 5000 a1.sources.r1.batchDurationMillis = 2000 a1.sources.r1.kafka.bootstrap.servers = bigdata01:9092,bigdata02:9092,bigdata03:9092 a1.sources.r1.kafka.topics=topic_start ## source2 a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource a1.sources.r2.batchSize = 5000 a1.sources.r2.batchDurationMillis = 2000 a1.sources.r2.kafka.bootstrap.servers = bigdata01:9092,bigdata02:9092,bigdata03:9092 a1.sources.r2.kafka.topics=topic_event ## channel1 a1.channels.c1.type = file a1.channels.c1.checkpointDir = /opt/modules/apache-flume-1.7.0-bin/checkpoint/behavior1 a1.channels.c1.dataDirs = /opt/modules/apache-flume-1.7.0-bin/data/behavior1/ a1.channels.c1.maxFileSize = 2146435071 a1.channels.c1.capacity = 1000000 a1.channels.c1.keep-alive = 6 ## channel2 a1.channels.c2.type = file a1.channels.c2.checkpointDir = /opt/modules/apache-flume-1.7.0-bin/checkpoint/behavior2 a1.channels.c2.dataDirs = /opt/modules/apache-flume-1.7.0-bin/data/behavior2/ a1.channels.c2.maxFileSize = 2146435071 a1.channels.c2.capacity = 1000000 a1.channels.c2.keep-alive = 6 ## sink1 a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs://bigdata01:8020/origin_data/gmall/log/topic_start/%Y-%m-%d a1.sinks.k1.hdfs.filePrefix = logstart- a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = second ##sink2 a1.sinks.k2.type = hdfs a1.sinks.k2.hdfs.path = hdfs://bigdata01:8020/origin_data/gmall/log/topic_event/%Y-%m-%d a1.sinks.k2.hdfs.filePrefix = logevent- a1.sinks.k2.hdfs.round = true a1.sinks.k2.hdfs.roundValue = 10 a1.sinks.k2.hdfs.roundUnit = second ## 不要产生大量小文件 a1.sinks.k1.hdfs.rollInterval = 10 a1.sinks.k1.hdfs.rollSize = 134217728 a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k2.hdfs.rollInterval = 10 a1.sinks.k2.hdfs.rollSize = 134217728 a1.sinks.k2.hdfs.rollCount = 0 ## 控制输出文件是原生文件。 a1.sinks.k1.hdfs.fileType = CompressedStream a1.sinks.k2.hdfs.fileType = CompressedStream a1.sinks.k1.hdfs.codeC = lzop a1.sinks.k2.hdfs.codeC = lzop ## 拼装 a1.sources.r1.channels = c1 a1.sinks.k1.channel= c1 a1.sources.r2.channels = c2 a1.sinks.k2.channel= c2 ``` hadoop中的core-site.xml配置文件如下: ``` <property> <name>io.compression.codecs</name> <value> org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> ``` lzo的包已经放入到hadoop对应的目录下: ``` /opt/modules/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar ``` 不知道是不是环境变量问题,急急急,在线等。。。。。。。
tensorflow yolov3把ckpt文件转成pb文件时怎么知道输入输出节点是什么(我从tensorboard里面找不到最终的输出节点)
def freeze_graph(input_checkpoint, output_graph): ''' :param input_checkpoint: :param output_graph: PB模型保存路径 ''' # 输出节点名称 output_node_names = ''
tensorflow中checkpoint文件夹中只保存了最近5个模型文件。请问怎么破?tensorflow版本1.4.1。
![图片说明](https://img-ask.csdn.net/upload/201901/23/1548246752_610153.png)哪位英雄好汉帮下忙。
如何可视化tensorflow版的fater rcnn的训练过程?
小白一枚,github下的faster rcnn tf版在win10系统上,代码里没有输出训练日志的语句无log文件, 想问一下怎么添加语句可以通过tensorboard显示?下为train代码 import time import tensorflow as tf import numpy as np from tensorflow.python import pywrap_tensorflow import lib.config.config as cfg from lib.datasets import roidb as rdl_roidb from lib.datasets.factory import get_imdb from lib.datasets.imdb import imdb as imdb2 from lib.layer_utils.roi_data_layer import RoIDataLayer from lib.nets.vgg16 import vgg16 from lib.utils.timer import Timer try: import cPickle as pickle except ImportError: import pickle import os def get_training_roidb(imdb): """Returns a roidb (Region of Interest database) for use in training.""" if True: print('Appending horizontally-flipped training examples...') imdb.append_flipped_images() print('done') print('Preparing training data...') rdl_roidb.prepare_roidb(imdb) print('done') return imdb.roidb def combined_roidb(imdb_names): """ Combine multiple roidbs """ def get_roidb(imdb_name): imdb = get_imdb(imdb_name) print('Loaded dataset `{:s}` for training'.format(imdb.name)) imdb.set_proposal_method("gt") print('Set proposal method: {:s}'.format("gt")) roidb = get_training_roidb(imdb) return roidb roidbs = [get_roidb(s) for s in imdb_names.split('+')] roidb = roidbs[0] if len(roidbs) > 1: for r in roidbs[1:]: roidb.extend(r) tmp = get_imdb(imdb_names.split('+')[1]) imdb = imdb2(imdb_names, tmp.classes) else: imdb = get_imdb(imdb_names) return imdb, roidb class Train: def __init__(self): # Create network if cfg.FLAGS.net == 'vgg16': self.net = vgg16(batch_size=cfg.FLAGS.ims_per_batch) else: raise NotImplementedError self.imdb, self.roidb = combined_roidb("voc_2007_trainval") self.data_layer = RoIDataLayer(self.roidb, self.imdb.num_classes) self.output_dir = cfg.get_output_dir(self.imdb, 'default') def train(self): # Create session tfconfig = tf.ConfigProto(allow_soft_placement=True) tfconfig.gpu_options.allow_growth = True sess = tf.Session(config=tfconfig) with sess.graph.as_default(): tf.set_random_seed(cfg.FLAGS.rng_seed) layers = self.net.create_architecture(sess, "TRAIN", self.imdb.num_classes, tag='default') loss = layers['total_loss'] lr = tf.Variable(cfg.FLAGS.learning_rate, trainable=False) momentum = cfg.FLAGS.momentum optimizer = tf.train.MomentumOptimizer(lr, momentum) gvs = optimizer.compute_gradients(loss) # Double bias # Double the gradient of the bias if set if cfg.FLAGS.double_bias: final_gvs = [] with tf.variable_scope('Gradient_Mult'): for grad, var in gvs: scale = 1. if cfg.FLAGS.double_bias and '/biases:' in var.name: scale *= 2. if not np.allclose(scale, 1.0): grad = tf.multiply(grad, scale) final_gvs.append((grad, var)) train_op = optimizer.apply_gradients(final_gvs) else: train_op = optimizer.apply_gradients(gvs) # We will handle the snapshots ourselves self.saver = tf.train.Saver(max_to_keep=100000) # Write the train and validation information to tensorboard # writer = tf.summary.FileWriter(self.tbdir, sess.graph) # valwriter = tf.summary.FileWriter(self.tbvaldir) # Load weights # Fresh train directly from ImageNet weights print('Loading initial model weights from {:s}'.format(cfg.FLAGS.pretrained_model)) variables = tf.global_variables() # Initialize all variables first sess.run(tf.variables_initializer(variables, name='init')) var_keep_dic = self.get_variables_in_checkpoint_file(cfg.FLAGS.pretrained_model) # Get the variables to restore, ignorizing the variables to fix variables_to_restore = self.net.get_variables_to_restore(variables, var_keep_dic) restorer = tf.train.Saver(variables_to_restore) restorer.restore(sess, cfg.FLAGS.pretrained_model) print('Loaded.') # Need to fix the variables before loading, so that the RGB weights are changed to BGR # For VGG16 it also changes the convolutional weights fc6 and fc7 to # fully connected weights self.net.fix_variables(sess, cfg.FLAGS.pretrained_model) print('Fixed.') sess.run(tf.assign(lr, cfg.FLAGS.learning_rate)) last_snapshot_iter = 0 timer = Timer() iter = last_snapshot_iter + 1 last_summary_time = time.time() while iter < cfg.FLAGS.max_iters + 1: # Learning rate if iter == cfg.FLAGS.step_size + 1: # Add snapshot here before reducing the learning rate # self.snapshot(sess, iter) sess.run(tf.assign(lr, cfg.FLAGS.learning_rate * cfg.FLAGS.gamma)) timer.tic() # Get training data, one batch at a time blobs = self.data_layer.forward() # Compute the graph without summary rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, total_loss = self.net.train_step(sess, blobs, train_op) timer.toc() iter += 1 # Display training information if iter % (cfg.FLAGS.display) == 0: print('iter: %d / %d, total loss: %.6f\n >>> rpn_loss_cls: %.6f\n ' '>>> rpn_loss_box: %.6f\n >>> loss_cls: %.6f\n >>> loss_box: %.6f\n ' % \ (iter, cfg.FLAGS.max_iters, total_loss, rpn_loss_cls, rpn_loss_box, loss_cls, loss_box)) print('speed: {:.3f}s / iter'.format(timer.average_time)) if iter % cfg.FLAGS.snapshot_iterations == 0: self.snapshot(sess, iter ) def get_variables_in_checkpoint_file(self, file_name): try: reader = pywrap_tensorflow.NewCheckpointReader(file_name) var_to_shape_map = reader.get_variable_to_shape_map() return var_to_shape_map except Exception as e: # pylint: disable=broad-except print(str(e)) if "corrupted compressed block contents" in str(e): print("It's likely that your checkpoint file has been compressed " "with SNAPPY.") def snapshot(self, sess, iter): net = self.net if not os.path.exists(self.output_dir): os.makedirs(self.output_dir) # Store the model snapshot filename = 'vgg16_faster_rcnn_iter_{:d}'.format(iter) + '.ckpt' filename = os.path.join(self.output_dir, filename) self.saver.save(sess, filename) print('Wrote snapshot to: {:s}'.format(filename)) # Also store some meta information, random state, etc. nfilename = 'vgg16_faster_rcnn_iter_{:d}'.format(iter) + '.pkl' nfilename = os.path.join(self.output_dir, nfilename) # current state of numpy random st0 = np.random.get_state() # current position in the database cur = self.data_layer._cur # current shuffled indeces of the database perm = self.data_layer._perm # Dump the meta info with open(nfilename, 'wb') as fid: pickle.dump(st0, fid, pickle.HIGHEST_PROTOCOL) pickle.dump(cur, fid, pickle.HIGHEST_PROTOCOL) pickle.dump(perm, fid, pickle.HIGHEST_PROTOCOL) pickle.dump(iter, fid, pickle.HIGHEST_PROTOCOL) return filename, nfilename if __name__ == '__main__': train = Train() train.train()
deeplab v3+训练loss不收敛问题
* 我使用的是官网的代码https://github.com/tensorflow/models/tree/master/research/deeplab 复现deeplab v3+; * 训练数据就是标准的Pascal voc2012。训练之前已经按照官网上的说法,通过运行脚本download_and_convert_voc2012.sh下载voc2012数据、并将label转换为单通道、并将数据转换为需要的tfrecord格式; * 训练模型也是从提供的model_zoo下载的https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md; * 学习率保持默认,即learning rate=0.0001; * Linux Ubuntu 16.04;TensorFlow1.6.0 installed from Anaconda;CUDA9.0/cudnn7.0.5;GeForce GTX 1080 Ti; * 具体训练代码是: ``` python deeplab/train.py \ --logtostderr \ --training_number_of_steps=30000 \ --train_split="train" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --train_crop_size=513 \ --train_crop_size=513 \ --train_batch_size=2 \ --dataset="pascal_voc_seg" \ --fine_tune_batch_norm = False \ --tf_initial_checkpoint="{下载的checkpoint路径}/deeplabv3_pascal_train_aug/model.ckpt.index" \ --train_logdir="{要写入路径}/exp/train_on_train_set/train" \ --dataset_dir="{数据集路径}/pascal_voc_seg/tfrecord" ``` * 然而loss一直不收敛:![图片说明](https://img-ask.csdn.net/upload/201812/25/1545727654_85154.png) * 最终出现nan值错误![图片说明](https://img-ask.csdn.net/upload/201812/25/1545727689_745383.png) * 如果训练的次数少一点,验证一下结果,发现miou只有零点零几:![图片说明](https://img-ask.csdn.net/upload/201812/26/1545785941_997452.jpg) * 一直没有找到原因,感觉步骤没有问题,也参照过各种博客,大家似乎都没有出现这种情况,希望大佬们可以帮忙
tensorflow,python运行时报错在reshape上,求大神解答
代码来自于一篇博客,用tensorflow判断拨号图标和短信图标的分类,训练已经成功运行,以下为测试代码,错误出现在38行 image = tf.reshape(image, [1, 208, 208, 3]) 我的测试图片是256*256的,也测试了48*48的 ``` #!/usr/bin/python # -*- coding:utf-8 -*- # @Time : 2018/3/31 0031 17:50 # @Author : scw # @File : main.py # 进行图片预测方法调用的文件 import numpy as np from PIL import Image import tensorflow as tf import matplotlib.pyplot as plt from shenjingwangluomodel import inference # 定义需要进行分类的种类,这里我是进行分两种,因为一种为是拨号键,另一种就是非拨号键 CallPhoneStyle = 2 # 进行测试的操作处理========================== # 加载要进行测试的图片 def get_one_image(img_dir): image = Image.open(img_dir) # Image.open() # 好像一次只能打开一张图片,不能一次打开一个文件夹,这里大家可以去搜索一下 plt.imshow(image) image = image.resize([208, 208]) image_arr = np.array(image) return image_arr # 进行测试处理------------------------------------------------------- def test(test_file): # 设置加载训练结果的文件目录(这个是需要之前就已经训练好的,别忘记) log_dir = '/home/administrator/test_system/calldata2/' # 打开要进行测试的图片 image_arr = get_one_image(test_file) with tf.Graph().as_default(): # 把要进行测试的图片转为tensorflow所支持的格式 image = tf.cast(image_arr, tf.float32) # 将图片进行格式化的处理 image = tf.image.per_image_standardization(image) # 将tensorflow的图片的格式参数,转变为shape格式的,好像就是去掉-1这样的列表 image = tf.reshape(image, [1, 208, 208, 3]) # print(image.shape) # 参数CallPhoneStyle:表示的是分为两类 p = inference(image, 1, CallPhoneStyle) # 这是训练出一个神经网络的模型 # 这里用到了softmax这个逻辑回归模型的处理 logits = tf.nn.softmax(p) x = tf.placeholder(tf.float32, shape=[208, 208, 3]) saver = tf.train.Saver() with tf.Session() as sess: # 对tensorflow的训练参数进行初始化,使用默认的方式 sess.run(tf.global_variables_initializer()) # 判断是否有进行训练模型的设置,所以一定要之前就进行了模型的训练 ckpt = tf.train.get_checkpoint_state(log_dir) if ckpt and ckpt.model_checkpoint_path: # global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] saver.restore(sess, ckpt.model_checkpoint_path) # 调用saver.restore()函数,加载训练好的网络模型 print('Loading success') else: print('No checkpoint') prediction = sess.run(logits, feed_dict={x: image_arr}) max_index = np.argmax(prediction) print('预测的标签为:') if max_index == 0: print("是拨号键图片") else: print("是短信图片") # print(max_index) print('预测的分类结果每种的概率为:') print(prediction) # 我用0,1表示两种分类,这也是我在训练的时候就设置好的 if max_index == 0: print('图片是拨号键图标的概率为 %.6f' %prediction[:, 0]) elif max_index == 1: print('图片是短信它图标的概率为 %.6f' %prediction[:, 1]) # 进行图片预测 test('/home/administrator/Downloads/def.png') ''' # 测试自己的训练集的图片是不是已经加载成功(因为这个是进行训练的第一步) train_dir = 'E:/tensorflowdata/calldata/' BATCH_SIZE = 5 # 生成批次队列中的容量缓存的大小 CAPACITY = 256 # 设置我自己要对图片进行统一大小的高和宽 IMG_W = 208 IMG_H = 208 image_list,label_list = get_files(train_dir) # 加载训练集的图片和对应的标签 image_batch,label_batch = get_batch(image_list,label_list,IMG_W,IMG_H,BATCH_SIZE,CAPACITY) # 进行批次图片加载到内存中 # 这是打开一个session,主要是用于进行图片的显示效果的测试 with tf.Session() as sess: i = 0 coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(coord=coord) try: while not coord.should_stop() and i < 2: # 提取出两个batch的图片并可视化。 img, label = sess.run([image_batch, label_batch]) for j in np.arange(BATCH_SIZE): print('label: %d' % label[j]) plt.imshow(img[j, :, :, :]) plt.show() i += 1 except tf.errors.OutOfRangeError: print('done!') finally: coord.request_stop() coord.join(threads) ''' ```
tensorflow模型推理,两个列表串行,输出结果是第一个列表的循环,新手求教
tensorflow模型推理,两个列表串行,输出结果是第一个列表的循环,新手求教 ``` from __future__ import print_function import argparse from datetime import datetime import os import sys import time import scipy.misc import scipy.io as sio import cv2 from glob import glob import multiprocessing os.environ["CUDA_VISIBLE_DEVICES"] = "0" import tensorflow as tf import numpy as np from PIL import Image from utils import * N_CLASSES = 20 DATA_DIR = './datasets/CIHP' LIST_PATH = './datasets/CIHP/list/val2.txt' DATA_ID_LIST = './datasets/CIHP/list/val_id2.txt' with open(DATA_ID_LIST, 'r') as f: NUM_STEPS = len(f.readlines()) RESTORE_FROM = './checkpoint/CIHP_pgn' # Load reader. with tf.name_scope("create_inputs") as scp1: reader = ImageReader(DATA_DIR, LIST_PATH, DATA_ID_LIST, None, False, False, False, None) image, label, edge_gt = reader.image, reader.label, reader.edge image_rev = tf.reverse(image, tf.stack([1])) image_list = reader.image_list image_batch = tf.stack([image, image_rev]) label_batch = tf.expand_dims(label, dim=0) # Add one batch dimension. edge_gt_batch = tf.expand_dims(edge_gt, dim=0) h_orig, w_orig = tf.to_float(tf.shape(image_batch)[1]), tf.to_float(tf.shape(image_batch)[2]) image_batch050 = tf.image.resize_images(image_batch, tf.stack([tf.to_int32(tf.multiply(h_orig, 0.50)), tf.to_int32(tf.multiply(w_orig, 0.50))])) image_batch075 = tf.image.resize_images(image_batch, tf.stack([tf.to_int32(tf.multiply(h_orig, 0.75)), tf.to_int32(tf.multiply(w_orig, 0.75))])) image_batch125 = tf.image.resize_images(image_batch, tf.stack([tf.to_int32(tf.multiply(h_orig, 1.25)), tf.to_int32(tf.multiply(w_orig, 1.25))])) image_batch150 = tf.image.resize_images(image_batch, tf.stack([tf.to_int32(tf.multiply(h_orig, 1.50)), tf.to_int32(tf.multiply(w_orig, 1.50))])) image_batch175 = tf.image.resize_images(image_batch, tf.stack([tf.to_int32(tf.multiply(h_orig, 1.75)), tf.to_int32(tf.multiply(w_orig, 1.75))])) ``` 新建网络 ``` # Create network. with tf.variable_scope('', reuse=False) as scope: net_100 = PGNModel({'data': image_batch}, is_training=False, n_classes=N_CLASSES) with tf.variable_scope('', reuse=True): net_050 = PGNModel({'data': image_batch050}, is_training=False, n_classes=N_CLASSES) with tf.variable_scope('', reuse=True): net_075 = PGNModel({'data': image_batch075}, is_training=False, n_classes=N_CLASSES) with tf.variable_scope('', reuse=True): net_125 = PGNModel({'data': image_batch125}, is_training=False, n_classes=N_CLASSES) with tf.variable_scope('', reuse=True): net_150 = PGNModel({'data': image_batch150}, is_training=False, n_classes=N_CLASSES) with tf.variable_scope('', reuse=True): net_175 = PGNModel({'data': image_batch175}, is_training=False, n_classes=N_CLASSES) # parsing net parsing_out1_050 = net_050.layers['parsing_fc'] parsing_out1_075 = net_075.layers['parsing_fc'] parsing_out1_100 = net_100.layers['parsing_fc'] parsing_out1_125 = net_125.layers['parsing_fc'] parsing_out1_150 = net_150.layers['parsing_fc'] parsing_out1_175 = net_175.layers['parsing_fc'] parsing_out2_050 = net_050.layers['parsing_rf_fc'] parsing_out2_075 = net_075.layers['parsing_rf_fc'] parsing_out2_100 = net_100.layers['parsing_rf_fc'] parsing_out2_125 = net_125.layers['parsing_rf_fc'] parsing_out2_150 = net_150.layers['parsing_rf_fc'] parsing_out2_175 = net_175.layers['parsing_rf_fc'] # edge net edge_out2_100 = net_100.layers['edge_rf_fc'] edge_out2_125 = net_125.layers['edge_rf_fc'] edge_out2_150 = net_150.layers['edge_rf_fc'] edge_out2_175 = net_175.layers['edge_rf_fc'] # combine resize parsing_out1 = tf.reduce_mean(tf.stack([tf.image.resize_images(parsing_out1_050, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out1_075, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out1_100, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out1_125, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out1_150, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out1_175, tf.shape(image_batch)[1:3,])]), axis=0) parsing_out2 = tf.reduce_mean(tf.stack([tf.image.resize_images(parsing_out2_050, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out2_075, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out2_100, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out2_125, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out2_150, tf.shape(image_batch)[1:3,]), tf.image.resize_images(parsing_out2_175, tf.shape(image_batch)[1:3,])]), axis=0) edge_out2_100 = tf.image.resize_images(edge_out2_100, tf.shape(image_batch)[1:3,]) edge_out2_125 = tf.image.resize_images(edge_out2_125, tf.shape(image_batch)[1:3,]) edge_out2_150 = tf.image.resize_images(edge_out2_150, tf.shape(image_batch)[1:3,]) edge_out2_175 = tf.image.resize_images(edge_out2_175, tf.shape(image_batch)[1:3,]) edge_out2 = tf.reduce_mean(tf.stack([edge_out2_100, edge_out2_125, edge_out2_150, edge_out2_175]), axis=0) raw_output = tf.reduce_mean(tf.stack([parsing_out1, parsing_out2]), axis=0) head_output, tail_output = tf.unstack(raw_output, num=2, axis=0) tail_list = tf.unstack(tail_output, num=20, axis=2) tail_list_rev = [None] * 20 for xx in range(14): tail_list_rev[xx] = tail_list[xx] tail_list_rev[14] = tail_list[15] tail_list_rev[15] = tail_list[14] tail_list_rev[16] = tail_list[17] tail_list_rev[17] = tail_list[16] tail_list_rev[18] = tail_list[19] tail_list_rev[19] = tail_list[18] tail_output_rev = tf.stack(tail_list_rev, axis=2) tail_output_rev = tf.reverse(tail_output_rev, tf.stack([1])) raw_output_all = tf.reduce_mean(tf.stack([head_output, tail_output_rev]), axis=0) raw_output_all = tf.expand_dims(raw_output_all, dim=0) pred_scores = tf.reduce_max(raw_output_all, axis=3) raw_output_all = tf.argmax(raw_output_all, axis=3) pred_all = tf.expand_dims(raw_output_all, dim=3) # Create 4-d tensor. raw_edge = tf.reduce_mean(tf.stack([edge_out2]), axis=0) head_output, tail_output = tf.unstack(raw_edge, num=2, axis=0) tail_output_rev = tf.reverse(tail_output, tf.stack([1])) raw_edge_all = tf.reduce_mean(tf.stack([head_output, tail_output_rev]), axis=0) raw_edge_all = tf.expand_dims(raw_edge_all, dim=0) pred_edge = tf.sigmoid(raw_edge_all) res_edge = tf.cast(tf.greater(pred_edge, 0.5), tf.int32) # prepare ground truth preds = tf.reshape(pred_all, [-1,]) gt = tf.reshape(label_batch, [-1,]) weights = tf.cast(tf.less_equal(gt, N_CLASSES - 1), tf.int32) # Ignoring all labels greater than or equal to n_classes. mIoU, update_op_iou = tf.contrib.metrics.streaming_mean_iou(preds, gt, num_classes=N_CLASSES, weights=weights) macc, update_op_acc = tf.contrib.metrics.streaming_accuracy(preds, gt, weights=weights) # # Which variables to load. # restore_var = tf.global_variables() # # Set up tf session and initialize variables. # config = tf.ConfigProto() # config.gpu_options.allow_growth = True # # gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.7) # # config=tf.ConfigProto(gpu_options=gpu_options) # init = tf.global_variables_initializer() # evaluate prosessing parsing_dir = './output' # Set up tf session and initialize variables. config = tf.ConfigProto() config.gpu_options.allow_growth = True ``` 以上是初始化网络和初始化参数载入模型,下面定义两个函数分别处理val1.txt和val2.txt两个列表内部的数据。 ``` # 处理第一个列表函数 def humanParsing1(): # Which variables to load. restore_var = tf.global_variables() init = tf.global_variables_initializer() with tf.Session(config=config) as sess: sess.run(init) sess.run(tf.local_variables_initializer()) # Load weights. loader = tf.train.Saver(var_list=restore_var) if RESTORE_FROM is not None: if load(loader, sess, RESTORE_FROM): print(" [*] Load SUCCESS") else: print(" [!] Load failed...") # Create queue coordinator. coord = tf.train.Coordinator() # Start queue threads. threads = tf.train.start_queue_runners(coord=coord, sess=sess) # Iterate over training steps. for step in range(NUM_STEPS): # parsing_, scores, edge_ = sess.run([pred_all, pred_scores, pred_edge])# , update_op parsing_, scores, edge_ = sess.run([pred_all, pred_scores, pred_edge]) # , update_op print('step {:d}'.format(step)) print(image_list[step]) img_split = image_list[step].split('/') img_id = img_split[-1][:-4] msk = decode_labels(parsing_, num_classes=N_CLASSES) parsing_im = Image.fromarray(msk[0]) parsing_im.save('{}/{}_vis.png'.format(parsing_dir, img_id)) coord.request_stop() coord.join(threads) # 处理第二个列表函数 def humanParsing2(): # Set up tf session and initialize variables. config = tf.ConfigProto() config.gpu_options.allow_growth = True # gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.7) # config=tf.ConfigProto(gpu_options=gpu_options) # Which variables to load. restore_var = tf.global_variables() init = tf.global_variables_initializer() with tf.Session(config=config) as sess: # Create queue coordinator. coord = tf.train.Coordinator() sess.run(init) sess.run(tf.local_variables_initializer()) # Load weights. loader = tf.train.Saver(var_list=restore_var) if RESTORE_FROM is not None: if load(loader, sess, RESTORE_FROM): print(" [*] Load SUCCESS") else: print(" [!] Load failed...") LIST_PATH = './datasets/CIHP/list/val1.txt' DATA_ID_LIST = './datasets/CIHP/list/val_id1.txt' with open(DATA_ID_LIST, 'r') as f: NUM_STEPS = len(f.readlines()) # with tf.name_scope("create_inputs"): with tf.name_scope(scp1): tf.get_variable_scope().reuse_variables() reader = ImageReader(DATA_DIR, LIST_PATH, DATA_ID_LIST, None, False, False, False, coord) image, label, edge_gt = reader.image, reader.label, reader.edge image_rev = tf.reverse(image, tf.stack([1])) image_list = reader.image_list # Start queue threads. threads = tf.train.start_queue_runners(coord=coord, sess=sess) # Load weights. loader = tf.train.Saver(var_list=restore_var) if RESTORE_FROM is not None: if load(loader, sess, RESTORE_FROM): print(" [*] Load SUCCESS") else: print(" [!] Load failed...") # Iterate over training steps. for step in range(NUM_STEPS): # parsing_, scores, edge_ = sess.run([pred_all, pred_scores, pred_edge])# , update_op parsing_, scores, edge_ = sess.run([pred_all, pred_scores, pred_edge]) # , update_op print('step {:d}'.format(step)) print(image_list[step]) img_split = image_list[step].split('/') img_id = img_split[-1][:-4] msk = decode_labels(parsing_, num_classes=N_CLASSES) parsing_im = Image.fromarray(msk[0]) parsing_im.save('{}/{}_vis.png'.format(parsing_dir, img_id)) coord.request_stop() coord.join(threads) if __name__ == '__main__': humanParsing1() humanParsing2() ``` 最终输出结果一直是第一个列表里面的循环,代码上用了 self.queue = tf.train.slice_input_producer([self.images, self.labels, self.edges], shuffle=shuffle),队列的方式进行多线程推理。最终得到的结果一直是第一个列表的循环,求大神告诉问题怎么解决。
关于PGSQL每天晚上12点15卡死的问题
之前服务器用 了10来天后 每天12点15就死机 我以为是服务器问题,就换了台服务器,10来天后又出现同样的问题,晚上12点15就准时卡死 重装PGSQL还原之前备份后不再卡死(10天后不敢保证,date还原后小了10G。原来是27G还原后变17G) 发现每天12点15出现这样的日志 (重装PGSQL后不卡死排除服务器问题。服务器配置是独立服务器16核16G,我这开多个JAVA采集。正常情况是CPU5%以内,内存使用8-10GB,句柄有点高到10W,线程2000,进程250,PGSQl连接150) ``` 2019-12-15 00:15:39 HKT LOG: checkpoints are occurring too frequently (21 seconds apart) 2019-12-15 00:15:39 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-15 00:15:54 HKT LOG: checkpoints are occurring too frequently (15 seconds apart) 2019-12-15 00:15:54 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-15 00:16:09 HKT LOG: checkpoints are occurring too frequently (15 seconds apart) 2019-12-15 00:16:09 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". ``` ``` 2019-12-14 00:15:22 HKT LOG: checkpoints are occurring too frequently (22 seconds apart) 2019-12-14 00:15:22 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-14 00:15:38 HKT LOG: checkpoints are occurring too frequently (16 seconds apart) 2019-12-14 00:15:38 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-14 00:15:51 HKT LOG: checkpoints are occurring too frequently (13 seconds apart) 2019-12-14 00:15:51 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-14 00:23:26 HKT ERROR: value too long for type character varying(50) ``` ``` 2019-12-13 00:15:07 HKT LOG: checkpoints are occurring too frequently (7 seconds apart) 2019-12-13 00:15:07 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-13 00:15:18 HKT LOG: checkpoints are occurring too frequently (11 seconds apart) 2019-12-13 00:15:18 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-13 00:15:34 HKT LOG: checkpoints are occurring too frequently (16 seconds apart) 2019-12-13 00:15:34 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-13 00:15:40 HKT LOG: checkpoints are occurring too frequently (6 seconds apart) 2019-12-13 00:15:40 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2019-12-13 00:16:06 HKT LOG: checkpoints are occurring too frequently (26 seconds apart) 2019-12-13 00:16:06 HKT HINT: Consider increasing the configuration parameter "checkpoint_segments". ``` 下面是我的配置 ``` #wal_level = minimal # minimal, archive, or hot_standby # (change requires restart) #fsync = on # turns forced synchronization on or off #synchronous_commit = on # synchronization level; # off, local, remote_write, or on #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync # fdatasync (default on Linux) # fsync # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes #wal_buffers = -1 # min 256kB, -1 sets based on shared_buffers # (change requires restart) #wal_writer_delay = 200ms # 1-10000 milliseconds #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # - Checkpoints - #checkpoint_segments = 32 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 15min # range 30s-1h #checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 #checkpoint_warning = 30s # 0 disables # - Archiving - #archive_mode = off # allows archiving to be done # (change requires restart) #archive_command = '' # command to use to archive a logfile segment # placeholders: %p = path of file to archive # %f = file name only # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' #archive_timeout = 0 # force a logfile segment switch after this # number of seconds; 0 disables #------------------------------------------------------------------------------ ``` ``` # QUERY TUNING #------------------------------------------------------------------------------ # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_indexonlyscan = on #enable_material = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - #seq_page_cost = 1.0 # measured on an arbitrary scale #random_page_cost = 4.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above #effective_cache_size = 2GB # - Genetic Query Optimizer - #geqo = on #geqo_threshold = 12 #geqo_effort = 5 # range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 #geqo_seed = 0.0 # range 0.0-1.0 # - Other Planner Options - #default_statistics_target = 100 # range 1-10000 #constraint_exclusion = partition # on, off, or partition #cursor_tuple_fraction = 0.1 # range 0.0-1.0 #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit # JOIN clauses #------ ``` 1:请问为什么每天晚上12点15会出现这样的日志(别的时段我查询了没有出现) 2:请大佬给我一个配置优化方案
tensorflow 在feed的时候报错could not convert string to float
报错ValueError: could not convert string to float: 'C:/Users/Administrator/Desktop/se//12.21/FV-USM/Published_database_FV-USM_Dec2013/1st_session/extractedvein/vein001_1/01.jpg' 以下是我的代码 ```#!/usr/bin/env python # -*- coding:utf-8 -*- import tensorflow as tf from PIL import Image import model import numpy as np import os def get_one_image(): i=1 n=1 k=1 i = str(i) n = str(n) k = str(k) a = i.zfill(3) b = n.zfill(1) c = k.zfill(2) files='C:/Users/Administrator/Desktop/se//12.21/FV-USM/Published_database_FV-USM_Dec2013/1st_session/extractedvein/vein' + a + '_' + b + '/' + c + '.jpg' # image=image.resize([60,175]) # image=np.array(image) return files def evaluate_one_image(): image_array = get_one_image() with tf.Graph().as_default(): BATCH_SIZE=1 N_CLASS=123 image = tf.image.decode_jpeg(image_array, channels=1) image = tf.image.resize_image_with_crop_or_pad(image,60, 175) image = tf.image.per_image_standardization(image) ###图片调整完成 # image=tf.cast(image,tf.string) # image1 = tf.image.decode_jpeg(image, channels=1) # image1 = tf.image.resize_image_with_crop_or_pad(image1, image_w, image_h) #image1 = tf.image.per_image_standardization(image) ###图片调整完成 #image=tf.image.per_image_standardization(image) image = tf.cast(image, tf.float32) image=tf.reshape(image,[1,60,175,1]) logit = model.inference(image, BATCH_SIZE, N_CLASS) logit=tf.nn.softmax(logit) # 因为 inference 的返回没有用激活函数,所以在这里对结果用softmax 激活。但是为什么要激活?? x=tf.placeholder(tf.float32,shape=[60,175,1]) logs_model_dir='C:/Users/Administrator/Desktop/python的神经网络/model/' saver=tf.train.Saver() with tf.Session() as sess: print("从指定路径加载模型。。。") ckpt=tf.train.get_checkpoint_state(logs_model_dir) if ckpt and ckpt.model_checkpoint_path: global_step=ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] saver.restore(sess,ckpt.model_checkpoint_path) print('模型加载成功,训练步数为 %s'%global_step) else: print('模型加载失败,,,,文件没有找到') ###IMG = sess.run(image) prediction=sess.run(logit,feed_dict={x:image_array}) max_index=np.argmax(prediction) print(max_index) evaluate_one_image() ```
终于明白阿里百度这样的大公司,为什么面试经常拿ThreadLocal考验求职者了
点击上面↑「爱开发」关注我们每晚10点,捕获技术思考和创业资源洞察什么是ThreadLocalThreadLocal是一个本地线程副本变量工具类,各个线程都拥有一份线程私...
程序员必须掌握的核心算法有哪些?
由于我之前一直强调数据结构以及算法学习的重要性,所以就有一些读者经常问我,数据结构与算法应该要学习到哪个程度呢?,说实话,这个问题我不知道要怎么回答你,主要取决于你想学习到哪些程度,不过针对这个问题,我稍微总结一下我学过的算法知识点,以及我觉得值得学习的算法。这些算法与数据结构的学习大多数是零散的,并没有一本把他们全部覆盖的书籍。下面是我觉得值得学习的一些算法以及数据结构,当然,我也会整理一些看过...
Linux(服务器编程):15---两种高效的事件处理模式(reactor模式、proactor模式)
前言 同步I/O模型通常用于实现Reactor模式 异步I/O模型则用于实现Proactor模式 最后我们会使用同步I/O方式模拟出Proactor模式 一、Reactor模式 Reactor模式特点 它要求主线程(I/O处理单元)只负责监听文件描述符上是否有事件发生,有的话就立即将时间通知工作线程(逻辑单元)。除此之外,主线程不做任何其他实质性的工作 读写数据,接受新的连接,以及处...
阿里面试官问我:如何设计秒杀系统?我的回答让他比起大拇指
你知道的越多,你不知道的越多 点赞再看,养成习惯 GitHub上已经开源 https://github.com/JavaFamily 有一线大厂面试点脑图和个人联系方式,欢迎Star和指教 前言 Redis在互联网技术存储方面使用如此广泛,几乎所有的后端技术面试官都要在Redis的使用和原理方面对小伙伴们进行360°的刁难。 作为一个在互联网公司面一次拿一次Offer的面霸,打败了...
五年程序员记流水账式的自白。
不知觉已中码龄已突破五年,一路走来从起初铁憨憨到现在的十九线程序员,一路成长,虽然不能成为高工,但是也能挡下一面,从15年很火的android开始入坑,走过java、.Net、QT,目前仍处于android和.net交替开发中。 毕业到现在一共就职过两家公司,目前是第二家,公司算是半个创业公司,所以基本上都会身兼多职。比如不光要写代码,还要写软著、软著评测、线上线下客户对接需求收集...
C语言魔塔游戏
很早就很想写这个,今天终于写完了。 游戏截图: 编译环境: VS2017 游戏需要一些图片,如果有想要的或者对游戏有什么看法的可以加我的QQ 2985486630 讨论,如果暂时没有回应,可以在博客下方留言,到时候我会看到。 下面我来介绍一下游戏的主要功能和实现方式 首先是玩家的定义,使用结构体,这个名字是可以自己改变的 struct gamerole { char n...
一文详尽系列之模型评估指标
点击上方“Datawhale”,选择“星标”公众号第一时间获取价值内容在机器学习领域通常会根据实际的业务场景拟定相应的不同的业务指标,针对不同机器学习问题如回归、分类、排...
究竟你适不适合买Mac?
我清晰的记得,刚买的macbook pro回到家,开机后第一件事情,就是上了淘宝网,花了500元钱,找了一个上门维修电脑的师傅,上门给我装了一个windows系统。。。。。。 表砍我。。。 当时买mac的初衷,只是想要个固态硬盘的笔记本,用来运行一些复杂的扑克软件。而看了当时所有的SSD笔记本后,最终决定,还是买个好(xiong)看(da)的。 已经有好几个朋友问我mba怎么样了,所以今天尽量客观...
程序员一般通过什么途径接私活?
二哥,你好,我想知道一般程序猿都如何接私活,我也想接,能告诉我一些方法吗? 上面是一个读者“烦不烦”问我的一个问题。其实不止是“烦不烦”,还有很多读者问过我类似这样的问题。 我接的私活不算多,挣到的钱也没有多少,加起来不到 20W。说实话,这个数目说出来我是有点心虚的,毕竟太少了,大家轻喷。但我想,恰好配得上“一般程序员”这个称号啊。毕竟苍蝇再小也是肉,我也算是有经验的人了。 唾弃接私活、做外...
压测学习总结(1)——高并发性能指标:QPS、TPS、RT、吞吐量详解
一、QPS,每秒查询 QPS:Queries Per Second意思是“每秒查询率”,是一台服务器每秒能够相应的查询次数,是对一个特定的查询服务器在规定时间内所处理流量多少的衡量标准。互联网中,作为域名系统服务器的机器的性能经常用每秒查询率来衡量。 二、TPS,每秒事务 TPS:是TransactionsPerSecond的缩写,也就是事务数/秒。它是软件测试结果的测量单位。一个事务是指一...
Python爬虫爬取淘宝,京东商品信息
小编是一个理科生,不善长说一些废话。简单介绍下原理然后直接上代码。 使用的工具(Python+pycharm2019.3+selenium+xpath+chromedriver)其中要使用pycharm也可以私聊我selenium是一个框架可以通过pip下载 pip installselenium -ihttps://pypi.tuna.tsinghua.edu.cn/simple/ ...
阿里程序员写了一个新手都写不出的低级bug,被骂惨了。
这种新手都不会范的错,居然被一个工作好几年的小伙子写出来,差点被当场开除了。
Java工作4年来应聘要16K最后没要,细节如下。。。
前奏: 今天2B哥和大家分享一位前几天面试的一位应聘者,工作4年26岁,统招本科。 以下就是他的简历和面试情况。 基本情况: 专业技能: 1、&nbsp;熟悉Sping了解SpringMVC、SpringBoot、Mybatis等框架、了解SpringCloud微服务 2、&nbsp;熟悉常用项目管理工具:SVN、GIT、MAVEN、Jenkins 3、&nbsp;熟悉Nginx、tomca...
2020年,冯唐49岁:我给20、30岁IT职场年轻人的建议
点击“技术领导力”关注∆每天早上8:30推送 作者|Mr.K 编辑| Emma 来源|技术领导力(ID:jishulingdaoli) 前天的推文《冯唐:职场人35岁以后,方法论比经验重要》,收到了不少读者的反馈,觉得挺受启发。其实,冯唐写了不少关于职场方面的文章,都挺不错的。可惜大家只记住了“春风十里不如你”、“如何避免成为油腻腻的中年人”等不那么正经的文章。 本文整理了冯...
程序员该看的几部电影
1、骇客帝国(1999) 概念:在线/离线,递归,循环,矩阵等 剧情简介: 不久的将来,网络黑客尼奥对这个看似正常的现实世界产生了怀疑。 他结识了黑客崔妮蒂,并见到了黑客组织的首领墨菲斯。 墨菲斯告诉他,现实世界其实是由一个名叫“母体”的计算机人工智能系统控制,人们就像他们饲养的动物,没有自由和思想,而尼奥就是能够拯救人类的救世主。 可是,救赎之路从来都不会一帆风顺,到底哪里才是真实的世界?如何...
Python绘图,圣诞树,花,爱心 | Turtle篇
每周每日,分享Python实战代码,入门资料,进阶资料,基础语法,爬虫,数据分析,web网站,机器学习,深度学习等等。 公众号回复【进群】沟通交流吧,QQ扫码进群学习吧 微信群 QQ群 1.画圣诞树 import turtle screen = turtle.Screen() screen.setup(800,600) circle = turtle.Turtle()...
作为一个程序员,CPU的这些硬核知识你必须会!
CPU对每个程序员来说,是个既熟悉又陌生的东西? 如果你只知道CPU是中央处理器的话,那可能对你并没有什么用,那么作为程序员的我们,必须要搞懂的就是CPU这家伙是如何运行的,尤其要搞懂它里面的寄存器是怎么一回事,因为这将让你从底层明白程序的运行机制。 随我一起,来好好认识下CPU这货吧 把CPU掰开来看 对于CPU来说,我们首先就要搞明白它是怎么回事,也就是它的内部构造,当然,CPU那么牛的一个东...
还记得那个提速8倍的IDEA插件吗?VS Code版本也发布啦!!
去年,阿里云发布了本地 IDE 插件 Cloud Toolkit,仅 IntelliJ IDEA 一个平台,就有 15 万以上的开发者进行了下载,体验了一键部署带来的开发便利。时隔一年的今天,阿里云正式发布了 Visual Studio Code 版本,全面覆盖前端开发者,帮助前端实现一键打包部署,让开发提速 8 倍。 VSCode 版本的插件,目前能做到什么? 安装插件之后,开发者可以立即体验...
破14亿,Python分析我国存在哪些人口危机!
一、背景 二、爬取数据 三、数据分析 1、总人口 2、男女人口比例 3、人口城镇化 4、人口增长率 5、人口老化(抚养比) 6、各省人口 7、世界人口 四、遇到的问题 遇到的问题 1、数据分页,需要获取从1949-2018年数据,观察到有近20年参数:LAST20,由此推测获取近70年的参数可设置为:LAST70 2、2019年数据没有放上去,可以手动添加上去 3、将数据进行 行列转换 4、列名...
2019年除夕夜的有感而发
天气:小雨(加小雪) 温度:3摄氏度 空气:严重污染(399) 风向:北风 风力:微风 现在是除夕夜晚上十点钟,再有两个小时就要新的一年了; 首先要说的是我没患病,至少现在是没有患病;但是心情确像患了病一样沉重; 现在这个时刻应该大部分家庭都在看春晚吧,或许一家人团团圆圆的坐在一起,或许因为某些特殊原因而不能团圆;但不管是身在何处,身处什么境地,我都想对每一个人说一句:新年快乐! 不知道csdn这...
听说想当黑客的都玩过这个Monyer游戏(1~14攻略)
第零关 进入传送门开始第0关(游戏链接) 请点击链接进入第1关: 连接在左边→ ←连接在右边 看不到啊。。。。(只能看到一堆大佬做完的留名,也能看到菜鸡的我,在后面~~) 直接fn+f12吧 &lt;span&gt;连接在左边→&lt;/span&gt; &lt;a href="first.php"&gt;&lt;/a&gt; &lt;span&gt;←连接在右边&lt;/span&gt; o...
在家远程办公效率低?那你一定要收好这个「在家办公」神器!
相信大家都已经收到国务院延长春节假期的消息,接下来,在家远程办公可能将会持续一段时间。 但是问题来了。远程办公不是人在电脑前就当坐班了,相反,对于沟通效率,文件协作,以及信息安全都有着极高的要求。有着非常多的挑战,比如: 1在异地互相不见面的会议上,如何提高沟通效率? 2文件之间的来往反馈如何做到及时性?如何保证信息安全? 3如何规划安排每天工作,以及如何进行成果验收? ...... ...
作为一个程序员,内存和磁盘的这些事情,你不得不知道啊!!!
截止目前,我已经分享了如下几篇文章: 一个程序在计算机中是如何运行的?超级干货!!! 作为一个程序员,CPU的这些硬核知识你必须会! 作为一个程序员,内存的这些硬核知识你必须懂! 这些知识可以说是我们之前都不太重视的基础知识,可能大家在上大学的时候都学习过了,但是嘞,当时由于老师讲解的没那么有趣,又加上这些知识本身就比较枯燥,所以嘞,大家当初几乎等于没学。 再说啦,学习这些,也看不出来有什么用啊!...
2020年的1月,我辞掉了我的第一份工作
其实,这篇文章,我应该早点写的,毕竟现在已经2月份了。不过一些其它原因,或者是我的惰性、还有一些迷茫的念头,让自己迟迟没有试着写一点东西,记录下,或者说是总结下自己前3年的工作上的经历、学习的过程。 我自己知道的,在写自己的博客方面,我的文笔很一般,非技术类的文章不想去写;另外我又是一个还比较热衷于技术的人,而平常复杂一点的东西,如果想写文章写的清楚点,是需要足够...
别低估自己的直觉,也别高估自己的智商
所有群全部吵翻天,朋友圈全部沦陷,公众号疯狂转发。这两周没怎么发原创,只发新闻,可能有人注意到了。我不是懒,是文章写了却没发,因为大家的关注力始终在这次的疫情上面,发了也没人看。当然,我...
这个世界上人真的分三六九等,你信吗?
偶然间,在知乎上看到一个问题 一时间,勾起了我深深的回忆。 以前在厂里打过两次工,做过家教,干过辅导班,做过中介。零下几度的晚上,贴过广告,满脸、满手地长冻疮。 再回首那段岁月,虽然苦,但让我学会了坚持和忍耐。让我明白了,在这个世界上,无论环境多么的恶劣,只要心存希望,星星之火,亦可燎原。 下文是原回答,希望能对你能有所启发。 如果我说,这个世界上人真的分三六九等,...
节后首个工作日,企业们集体开晨会让钉钉挂了
By 超神经场景描述:昨天 2 月 3 日,是大部分城市号召远程工作的第一天,全国有接近 2 亿人在家开始远程办公,钉钉上也有超过 1000 万家企业活跃起来。关键词:十一出行 人脸...
Java基础知识点梳理
虽然已经在实际工作中经常与java打交道,但是一直没系统地对java这门语言进行梳理和总结,掌握的知识也比较零散。恰好利用这段时间重新认识下java,并对一些常见的语法和知识点做个总结与回顾,一方面为了加深印象,方便后面查阅,一方面为了掌握好Android打下基础。
2020年全新Java学习路线图,含配套视频,学完即为中级Java程序员!!
新的一年来临,突如其来的疫情打破了平静的生活! 在家的你是否很无聊,如果无聊就来学习吧! 世上只有一种投资只赚不赔,那就是学习!!! 传智播客于2020年升级了Java学习线路图,硬核升级,免费放送! 学完你就是中级程序员,能更快一步找到工作! 一、Java基础 JavaSE基础是Java中级程序员的起点,是帮助你从小白到懂得编程的必经之路。 在Java基础板块中有6个子模块的学...
B 站上有哪些很好的学习资源?
哇说起B站,在小九眼里就是宝藏般的存在,放年假宅在家时一天刷6、7个小时不在话下,更别提今年的跨年晚会,我简直是跪着看完的!! 最早大家聚在在B站是为了追番,再后来我在上面刷欧美新歌和漂亮小姐姐的舞蹈视频,最近两年我和周围的朋友们已经把B站当作学习教室了,而且学习成本还免费,真是个励志的好平台ヽ(.◕ฺˇд ˇ◕ฺ;)ノ 下面我们就来盘点一下B站上优质的学习资源: 综合类 Oeasy: 综合...
如何优雅地打印一个Java对象?
你好呀,我是沉默王二,一个和黄家驹一样身高,和刘德华一样颜值的程序员。虽然已经写了十多年的 Java 代码,但仍然觉得自己是个菜鸟(请允许我惭愧一下)。 在一个月黑风高的夜晚,我思前想后,觉得再也不能这么蹉跎下去了。于是痛下决心,准备通过输出的方式倒逼输入,以此来修炼自己的内功,从而进阶成为一名真正意义上的大神。与此同时,希望这些文章能够帮助到更多的读者,让大家在学习的路上不再寂寞、空虚和冷。 ...
相关热词 c# 压缩图片好麻烦 c#计算数组中的平均值 c#获取路由参数 c#日期精确到分钟 c#自定义异常必须继承 c#查表并返回值 c# 动态 表达式树 c# 监控方法耗时 c# listbox c#chart显示滚动条
立即提问