tensorflow-gpu跑训练时GPU的compute0使用率90%多,compute1使用率却为0% 5C

如题,tensorflow-gpu跑训练时,Windows的任务管理器显示GPU的compute0使用率90%多,compute1使用率却为0%,截图如下,请问是什么原因?
图片说明

3个回答

这是很正常的,gpu窗口里面compute0、compute1等表示不同的指令,不是像多cpu那样表示不同的内核,所以一个满的,另一个空的是很正常的。

qq_38419981
qq_38419981 请问这个可以同时利用0和1吗,会不会快一点
2 个月之前 回复

你要在代码中制定所使用的GPU,一般默认的是0。

请问能具体说一下怎么指定gpu0 和gpu1 吗 万分感谢
如果可以的话请加qq 1138990957

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
代码用tensorflow-CPU运行时没有错误,用GPU运行时每次到51%报错
![图片说明](https://img-ask.csdn.net/upload/201908/12/1565613050_565691.png) 51%|████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 199/391 [00:38<00:21, 8.81it/s]2019-08-12 20:20:04.963304: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0000016EAC1D0A40 2019-08-12 20:20:05.763636: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd ** On entry to SGEMM parameter number 10 had an illegal value 2019-08-12 20:20:06.320473: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 5236925 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:06.328931: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1871 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:06.838588: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 687520 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:06.850771: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:06.999345: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 42770 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:07.499292: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1497278 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:07.510245: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:08.020011: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 256112 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:08.529828: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 341471 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:08.540870: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 16833 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:08.697339: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1190 for batch index 0, expected info = 0. Debug_info = heevd Traceback (most recent call last): File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd [[{{node KFAC/SelfAdjointEigV2_10}}]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 67, in <module> main() File "main.py", line 63, in main trainer.train() File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 16, in train self.train_epoch() File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 42, in train_epoch self.sess.run([self.model.inv_update_op, self.model.var_update_op], feed_dict=feed_dict) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run run_metadata) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd [[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]] Caused by op 'KFAC/SelfAdjointEigV2_10', defined at: File "main.py", line 67, in <module> main() File "main.py", line 60, in main model_ = Model(config, _INPUT_DIM[config.dataset], len(train_loader.dataset)) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 21, in __init__ self.init_optim() File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 70, in init_optim momentum=self.config.momentum) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\optimizer.py", line 66, in __init__ inv_devices=inv_devices) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 58, in __init__ setup = self._setup(cov_ema_decay) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in _setup inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()} File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in <dictcomp> inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()} File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 116, in _get_all_inverse_update_ops for op in factor.make_inverse_update_ops(): File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\fisher_factors.py", line 360, in make_inverse_update_ops ops.append(inv.assign(utils.posdef_inv(self._cov, damping))) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 144, in posdef_inv return posdef_inv_functions[POSDEF_INV_METHOD](tensor, identity, damping) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 161, in posdef_inv_eig tensor + damping * identity) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\linalg_ops.py", line 328, in self_adjoint_eig e, v = gen_linalg_ops.self_adjoint_eig_v2(tensor, compute_v=True, name=name) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\gen_linalg_ops.py", line 2016, in self_adjoint_eig_v2 "SelfAdjointEigV2", input=input, compute_v=compute_v, name=name) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op op_def=op_def) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd [[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]] ``` ```
求助Tensorflow下遇到Cuda compute capability问题
在Python下装了tensorflow-gpu,其中cuda为cuda_8.0.61_windows,cudnn为cudnn-8.0-windows7-x64-v5.1,安装没有问题,可以正常跑起来,但是在跑mnist手写体数据集时遇到以下问题: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:948] Ignoring visible gpu device (device: 0, name: GeForce GT 630M, pci bus id: 0000:01:00.0) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0. 究其原因是显卡不支持Cuda compute capability 3.0,看到网上caffe中此类问题的解决方法是在Makefile.config中注释掉USE_CUDNN,请问,哪位大神知道Tensorflow中如何解决这一问题呢?谢谢!_
在训练Tensorflow模型(object_detection)时,训练在第一次评估后退出,怎么使训练继续下去?
当我进行ssd模型训练时,训练进行了10分钟,然后进入评估阶段,评估之后程序就自动退出了,没有看到误和警告,这是为什么,怎么让程序一直训练下去? 训练命令: ``` python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr ``` 配置文件: ``` training exit after the first evaluation(only one evaluation) in Tensorflow model(object_detection) without error and waring System information What is the top-level directory of the model you are using:models/research/object_detection/ Have I written custom code (as opposed to using a stock example script provided in TensorFlow):NO OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows-10(64bit) TensorFlow installed from (source or binary):conda install tensorflow-gpu TensorFlow version (use command below):1.13.1 Bazel version (if compiling from source):N/A CUDA/cuDNN version:cudnn-7.6.0 GPU model and memory:GeForce GTX 1060 6GB Exact command to reproduce:See below my command for training : python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr This is my config : train_config { batch_size: 24 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } optimizer { rms_prop_optimizer { learning_rate { exponential_decay_learning_rate { initial_learning_rate: 0.00400000018999 decay_steps: 800720 decay_factor: 0.949999988079 } } momentum_optimizer_value: 0.899999976158 decay: 0.899999976158 epsilon: 1.0 } } fine_tune_checkpoint: "D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt" from_detection_checkpoint: true num_steps: 200000 train_input_reader { label_map_path: "D:/gitcode/models/research/object_detection/idol/tf_label_map.pbtxt" tf_record_input_reader { input_path: "D:/gitcode/models/research/object_detection/idol/train/Iframe_??????.tfrecord" } } eval_config { num_examples: 8000 max_evals: 10 use_moving_averages: false } eval_input_reader { label_map_path: "D:/gitcode/models/research/object_detection/idol/tf_label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "D:/gitcode/models/research/object_detection/idol/eval/Iframe_??????.tfrecord" } ``` 窗口输出: (default) D:\gitcode\models\research>python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see: https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue. WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x0000027CBAB7BB70>) includes params argument, but params are not passed to Estimator. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py:86: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.experimental.parallel_interleave(...). WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\core\preprocessor.py:196: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version. Instructions for updating: seed2 arg is deprecated.Use sample_distorted_bounding_box_v2 instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\ops\losses\losses_impl.py:448: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\ops\array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2019-08-14 16:29:31.607841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:04:00.0 totalMemory: 6.00GiB freeMemory: 4.97GiB 2019-08-14 16:29:31.621836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-08-14 16:29:32.275712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-14 16:29:32.283072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-08-14 16:29:32.288675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-08-14 16:29:32.293514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\eval_util.py:796: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\visualization_utils.py:498: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, use tf.py_function, which takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means tf.py_functions can use accelerators such as GPUs as well as being differentiable using a gradient tape. 2019-08-14 16:41:44.736212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-08-14 16:41:44.741242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-14 16:41:44.747522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-08-14 16:41:44.751256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-08-14 16:41:44.755548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. creating index... index created! creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=2.43s). Accumulating evaluation results... DONE (t=0.14s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.287 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.529 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.278 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.031 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.312 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.162 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.356 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.356 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.061 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.384 (default) D:\gitcode\models\research>
Tensorflow 无报错结束运行
TensorFlow无报错结束运行,这是一个预测程序的输出结果,之前可以运行,没有更改过代码,控制台最后输出的内容如下,请问各位大佬该如何解决? > 2019-12-07 01:57:11.506176: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 > 2019-12-07 01:57:11.510682: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll > 2019-12-07 01:57:11.811459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:65:00.0 > 2019-12-07 01:57:11.811883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8095 pciBusID: 0000:b3:00.0 > 2019-12-07 01:57:11.812080: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. > 2019-12-07 01:57:11.813215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0, 1 > 2019-12-07 01:57:12.694203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: > 2019-12-07 01:57:12.694360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1 2019-12-07 01:57:12.694449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N N 2019-12-07 01:57:12.694538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: N N > 2019-12-07 01:57:12.695690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8694 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:65:00.0, compute capability: 7.5) > 2019-12-07 01:57:12.697395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 6358 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080, pci bus id: 0000:b3:00.0, compute capability: 6.1) > Process finished with exit code 0
运行tensorflow时出现tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed这个错误
运行tensorflow时出现tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed这个错误,查了一下说是gpu被占用了,从下面这里开始出问题的: ``` 2019-10-17 09:28:49.495166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6382 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) (60000, 28, 28) (60000, 10) 2019-10-17 09:28:51.275415: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cublas64_100.dll'; dlerror: cublas64_100.dll not found ``` ![图片说明](https://img-ask.csdn.net/upload/201910/17/1571277238_292620.png) 最后显示的问题: ![图片说明](https://img-ask.csdn.net/upload/201910/17/1571277311_655722.png) 试了一下网上的方法,比如加代码: ``` gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) ``` 但最后提示: ![图片说明](https://img-ask.csdn.net/upload/201910/17/1571277460_72752.png) 现在不知道要怎么解决了。新手想试下简单的数字识别,步骤也是按教程一步步来的,可能用的版本和教程不一样,我用的是刚下的:2.0tensorflow和以下: ![图片说明](https://img-ask.csdn.net/upload/201910/17/1571277627_439100.png) 不知道会不会有版本问题,现在紧急求助各位大佬,还有没有其它可以尝试的方法。测试程序加法运算可以执行,数字识别图片运行的时候我看了下,GPU最大占有率才0.2%,下面是完整数字图片识别代码: ``` import os import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers, optimizers, datasets os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' #gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.2) #sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) (x, y), (x_val, y_val) = datasets.mnist.load_data() x = tf.convert_to_tensor(x, dtype=tf.float32) / 255. y = tf.convert_to_tensor(y, dtype=tf.int32) y = tf.one_hot(y, depth=10) print(x.shape, y.shape) train_dataset = tf.data.Dataset.from_tensor_slices((x, y)) train_dataset = train_dataset.batch(200) model = keras.Sequential([ layers.Dense(512, activation='relu'), layers.Dense(256, activation='relu'), layers.Dense(10)]) optimizer = optimizers.SGD(learning_rate=0.001) def train_epoch(epoch): # Step4.loop for step, (x, y) in enumerate(train_dataset): with tf.GradientTape() as tape: # [b, 28, 28] => [b, 784] x = tf.reshape(x, (-1, 28 * 28)) # Step1. compute output # [b, 784] => [b, 10] out = model(x) # Step2. compute loss loss = tf.reduce_sum(tf.square(out - y)) / x.shape[0] # Step3. optimize and update w1, w2, w3, b1, b2, b3 grads = tape.gradient(loss, model.trainable_variables) # w' = w - lr * grad optimizer.apply_gradients(zip(grads, model.trainable_variables)) if step % 100 == 0: print(epoch, step, 'loss:', loss.numpy()) def train(): for epoch in range(30): train_epoch(epoch) if __name__ == '__main__': train() ``` 希望能有人给下建议或解决方法,拜谢!
openstack创建实例成功但是启动实例报错
以下是nova报错详情: ``` /var/log/nova/nova-compute.log:2020-02-11 08:43:18.893 1327 ERROR nova.virt.libvirt.guest [req-ecfa9ec9-82fd-42dd-838f-ff5938af32e7 a7756266208f439bbb8324fb22853932 e7af6fe8f68647ab8010beaa7cb440ed - - -] Error launching a defined domain with XML: <domain type='kvm'> /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [req-ecfa9ec9-82fd-42dd-838f-ff5938af32e7 a7756266208f439bbb8324fb22853932 e7af6fe8f68647ab8010beaa7cb440ed - - -] [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] Instance failed to spawn /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] Traceback (most recent call last): /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2218, in _build_resources /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] except Exception: /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2064, in _build_and_run_instance /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] # saved in that function to prevent races. /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2779, in spawn /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] block_device_info=block_device_info) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4923, in _create_domain_and_network /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] xml, pause=pause, power_on=power_on) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4854, in _create_domain /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] guest.launch(pause=pause) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 142, in launch /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] self._encoded_xml, errors='ignore') /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] self.force_reraise() /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] six.reraise(self.type_, self.value, self.tb) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 137, in launch /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] return self._domain.createWithFlags(flags) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] result = proxy_call(self._autowrap, f, *args, **kwargs) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] rv = execute(f, *args, **kwargs) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] six.reraise(c, e, tb) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] rv = meth(*args, **kwargs) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] libvirtError: 内部错误:qemu unexpectedly closed the monitor: 2020-02-11T00:43:18.668387Z qemu-kvm: -drive file=/var/lib/nova/instances/a26fb462-a721-45eb-8eaa-776ec5da3b23/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none: Could not open '/var/lib/nova/instances/a26fb462-a721-45eb-8eaa-776ec5da3b23/disk': Permission denied /var/log/nova/nova-compute.log:2020-02-11 08:43:18.895 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] /var/log/nova/nova-compute.log:2020-02-11 08:43:19.145 1327 ERROR nova.compute.manager [req-ecfa9ec9-82fd-42dd-838f-ff5938af32e7 a7756266208f439bbb8324fb22853932 e7af6fe8f68647ab8010beaa7cb440ed - - -] [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] Unexpected build failure, not rescheduling build. /var/log/nova/nova-compute.log:2020-02-11 08:43:19.145 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] Traceback (most recent call last): /var/log/nova/nova-compute.log:2020-02-11 08:43:19.145 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance /var/log/nova/nova-compute.log:2020-02-11 08:43:19.145 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] /var/log/nova/nova-compute.log:2020-02-11 08:43:19.145 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance /var/log/nova/nova-compute.log:2020-02-11 08:43:19.145 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] exception.ImageNotActive, /var/log/nova/nova-compute.log:2020-02-11 08:43:19.145 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128) /var/log/nova/nova-compute.log:2020-02-11 08:43:19.145 1327 ERROR nova.compute.manager [instance: a26fb462-a721-45eb-8eaa-776ec5da3b23] ``` dashboard报错: ![图片说明](https://img-ask.csdn.net/upload/202002/11/1581385833_715215.png) ![图片说明](https://img-ask.csdn.net/upload/202002/11/1581385890_574951.png) 网上说了很多办法都没有用,个人初学小白,有没有路过的大神指点一二。万分感谢!
tensorflow代码用CPU运行时没有错误,用GPU运行时每次到51%报错,网上没有搜到相同的问题
51%|████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 199/391 [00:38<00:21, 8.81it/s]2019-08-12 20:20:04.963304: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0000016EAC1D0A40 2019-08-12 20:20:05.763636: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd ** On entry to SGEMM parameter number 10 had an illegal value 2019-08-12 20:20:06.320473: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 5236925 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:06.328931: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1871 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:06.838588: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 687520 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:06.850771: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:06.999345: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 42770 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:07.499292: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1497278 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:07.510245: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 321 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:08.020011: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 256112 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:08.529828: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 341471 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:08.540870: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 16833 for batch index 0, expected info = 0. Debug_info = heevd 2019-08-12 20:20:08.697339: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at cuda_solvers.cc:260 : Invalid argument: Got info = 1190 for batch index 0, expected info = 0. Debug_info = heevd Traceback (most recent call last): File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd [[{{node KFAC/SelfAdjointEigV2_10}}]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "main.py", line 67, in <module> main() File "main.py", line 63, in main trainer.train() File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 16, in train self.train_epoch() File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\train.py", line 42, in train_epoch self.sess.run([self.model.inv_update_op, self.model.var_update_op], feed_dict=feed_dict) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run run_metadata) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd [[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]] Caused by op 'KFAC/SelfAdjointEigV2_10', defined at: File "main.py", line 67, in <module> main() File "main.py", line 60, in main model_ = Model(config, _INPUT_DIM[config.dataset], len(train_loader.dataset)) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 21, in __init__ self.init_optim() File "E:\python代码\noisy-K-FAC\noisy-K-FAC\core\model.py", line 70, in init_optim momentum=self.config.momentum) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\optimizer.py", line 66, in __init__ inv_devices=inv_devices) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 58, in __init__ setup = self._setup(cov_ema_decay) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in _setup inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()} File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 108, in <dictcomp> inv_updates = {op.name: op for op in self._get_all_inverse_update_ops()} File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\estimator.py", line 116, in _get_all_inverse_update_ops for op in factor.make_inverse_update_ops(): File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\fisher_factors.py", line 360, in make_inverse_update_ops ops.append(inv.assign(utils.posdef_inv(self._cov, damping))) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 144, in posdef_inv return posdef_inv_functions[POSDEF_INV_METHOD](tensor, identity, damping) File "E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py", line 161, in posdef_inv_eig tensor + damping * identity) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\linalg_ops.py", line 328, in self_adjoint_eig e, v = gen_linalg_ops.self_adjoint_eig_v2(tensor, compute_v=True, name=name) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\ops\gen_linalg_ops.py", line 2016, in self_adjoint_eig_v2 "SelfAdjointEigV2", input=input, compute_v=compute_v, name=name) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op op_def=op_def) File "D:\softAPP\python\anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): Got info = 85505 for batch index 0, expected info = 0. Debug_info = heevd [[node KFAC/SelfAdjointEigV2_10 (defined at E:\python代码\noisy-K-FAC\noisy-K-FAC\ops\utils.py:161) ]] ``` ```
openstack启动实例报错No valid host was found
计算节点的nova错误日志: ``` 2020-02-11 17:05:07.817 1295 ERROR nova.virt.libvirt.guest [req-740339df-c3c4-450b-9787-9c7336c461a3 a7756266208f439bbb8324fb22853932 e7af6fe8f68647ab8010beaa7cb440ed - - -] Error launching a defined domain with XML: <domain type='kvm'> 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [req-740339df-c3c4-450b-9787-9c7336c461a3 a7756266208f439bbb8324fb22853932 e7af6fe8f68647ab8010beaa7cb440ed - - -] [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] Instance failed to spawn 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] Traceback (most recent call last): 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2229, in _build_resources 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] yield resources 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2075, in _build_and_run_instance 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] block_device_info=block_device_info) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2779, in spawn 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] block_device_info=block_device_info) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4923, in _create_domain_and_network 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] xml, pause=pause, power_on=power_on) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4854, in _create_domain 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] guest.launch(pause=pause) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 142, in launch 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] self._encoded_xml, errors='ignore') 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] self.force_reraise() 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] six.reraise(self.type_, self.value, self.tb) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 137, in launch 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] return self._domain.createWithFlags(flags) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] result = proxy_call(self._autowrap, f, *args, **kwargs) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] rv = execute(f, *args, **kwargs) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] six.reraise(c, e, tb) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] rv = meth(*args, **kwargs) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] libvirtError: internal error: process exited while connecting to monitor: 2020-02-11T09:05:07.623239Z qemu-kvm: -drive file=/var/lib/nova/instances/8020d30b-33ec-4836-8f61-f23516aa2508/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none: Could not open '/var/lib/nova/instances/8020d30b-33ec-4836-8f61-f23516aa2508/disk': Permission denied 2020-02-11 17:05:07.818 1295 ERROR nova.compute.manager [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] ``` 控制节点的nova-conductor.log ``` 2020-02-11 17:05:08.365 958 ERROR nova.scheduler.utils [req-740339df-c3c4-450b-9787-9c7336c461a3 a7756266208f439bbb8324fb22853932 e7af6fe8f68647ab8010beaa7cb440ed - - -] [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] Error from last host: compute (node compute): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1937, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2127, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance 8020d30b-33ec-4836-8f61-f23516aa2508 was re-scheduled: internal error: process exited while connecting to monitor: 2020-02-11T09:05:07.623239Z qemu-kvm: -drive file=/var/lib/nova/instances/8020d30b-33ec-4836-8f61-f23516aa2508/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none: Could not open '/var/lib/nova/instances/8020d30b-33ec-4836-8f61-f23516aa2508/disk': Permission denied\n"] 2020-02-11 17:05:08.450 958 WARNING nova.scheduler.utils [req-740339df-c3c4-450b-9787-9c7336c461a3 a7756266208f439bbb8324fb22853932 e7af6fe8f68647ab8010beaa7cb440ed - - -] [instance: 8020d30b-33ec-4836-8f61-f23516aa2508] Setting instance to ERROR state. ``` dashboard上报错 ![图片说明](https://img-ask.csdn.net/upload/202002/11/1581412785_788625.png) ![图片说明](https://img-ask.csdn.net/upload/202002/11/1581413291_261638.png) 我的网络配置是 ![图片说明](https://img-ask.csdn.net/upload/202002/11/1581412842_430142.png) ![图片说明](https://img-ask.csdn.net/upload/202002/11/1581412854_59896.png) 不知道有没有路过的大师指导这是什么原因啊,在网上找了好多方法没有解决。万分感谢!
TensorFlow Object Detection API 训练过程相关问题?
``` 2019-03-22 11:47:37.264972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:08:00.0, compute capability: 6.1) ``` Tensorflow能正常值用。 启动model_main().py后卡在这不动、相关文件夹中没有生成.ckpt文件。 ![图片说明](https://img-ask.csdn.net/upload/201903/22/1553233226_55747.jpg) 是我显卡太垃圾、计算慢还是其他原因啊???求大神。
tensorflow载入训练好的模型进行预测,同一张图片预测的结果却不一样????
最近在跑deeplabv1,在测试代码的时候,跑通了训练程序,但是用训练好的模型进行与测试却发现相同的图片预测的结果不一样??请问有大神知道怎么回事吗? 用的是saver.restore()方法载入模型。代码如下: ``` def main(): """Create the model and start the inference process.""" args = get_arguments() # Prepare image. img = tf.image.decode_jpeg(tf.read_file(args.img_path), channels=3) # Convert RGB to BGR. img_r, img_g, img_b = tf.split(value=img, num_or_size_splits=3, axis=2) img = tf.cast(tf.concat(axis=2, values=[img_b, img_g, img_r]), dtype=tf.float32) # Extract mean. img -= IMG_MEAN # Create network. net = DeepLabLFOVModel() # Which variables to load. trainable = tf.trainable_variables() # Predictions. pred = net.preds(tf.expand_dims(img, dim=0)) # Set up TF session and initialize variables. config = tf.ConfigProto() config.gpu_options.allow_growth = True sess = tf.Session(config=config) #init = tf.global_variables_initializer() sess.run(tf.global_variables_initializer()) # Load weights. saver = tf.train.Saver(var_list=trainable) load(saver, sess, args.model_weights) # Perform inference. preds = sess.run([pred]) print(preds) if not os.path.exists(args.save_dir): os.makedirs(args.save_dir) msk = decode_labels(np.array(preds)[0, 0, :, :, 0]) im = Image.fromarray(msk) im.save(args.save_dir + 'mask1.png') print('The output file has been saved to {}'.format( args.save_dir + 'mask.png')) if __name__ == '__main__': main() ``` 其中load是 ``` def load(saver, sess, ckpt_path): '''Load trained weights. Args: saver: TensorFlow saver object. sess: TensorFlow session. ckpt_path: path to checkpoint file with parameters. ''' ckpt = tf.train.get_checkpoint_state(ckpt_path) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) print("Restored model parameters from {}".format(ckpt_path)) ``` DeepLabLFOVMode类如下: ``` class DeepLabLFOVModel(object): """DeepLab-LargeFOV model with atrous convolution and bilinear upsampling. This class implements a multi-layer convolutional neural network for semantic image segmentation task. This is the same as the model described in this paper: https://arxiv.org/abs/1412.7062 - please look there for details. """ def __init__(self, weights_path=None): """Create the model. Args: weights_path: the path to the cpkt file with dictionary of weights from .caffemodel. """ self.variables = self._create_variables(weights_path) def _create_variables(self, weights_path): """Create all variables used by the network. This allows to share them between multiple calls to the loss function. Args: weights_path: the path to the ckpt file with dictionary of weights from .caffemodel. If none, initialise all variables randomly. Returns: A dictionary with all variables. """ var = list() index = 0 if weights_path is not None: with open(weights_path, "rb") as f: weights = cPickle.load(f) # Load pre-trained weights. for name, shape in net_skeleton: var.append(tf.Variable(weights[name], name=name)) del weights else: # Initialise all weights randomly with the Xavier scheme, # and # all biases to 0's. for name, shape in net_skeleton: if "/w" in name: # Weight filter. w = create_variable(name, list(shape)) var.append(w) else: b = create_bias_variable(name, list(shape)) var.append(b) return var def _create_network(self, input_batch, keep_prob): """Construct DeepLab-LargeFOV network. Args: input_batch: batch of pre-processed images. keep_prob: probability of keeping neurons intact. Returns: A downsampled segmentation mask. """ current = input_batch v_idx = 0 # Index variable. # Last block is the classification layer. for b_idx in xrange(len(dilations) - 1): for l_idx, dilation in enumerate(dilations[b_idx]): w = self.variables[v_idx * 2] b = self.variables[v_idx * 2 + 1] if dilation == 1: conv = tf.nn.conv2d(current, w, strides=[ 1, 1, 1, 1], padding='SAME') else: conv = tf.nn.atrous_conv2d( current, w, dilation, padding='SAME') current = tf.nn.relu(tf.nn.bias_add(conv, b)) v_idx += 1 # Optional pooling and dropout after each block. if b_idx < 3: current = tf.nn.max_pool(current, ksize=[1, ks, ks, 1], strides=[1, 2, 2, 1], padding='SAME') elif b_idx == 3: current = tf.nn.max_pool(current, ksize=[1, ks, ks, 1], strides=[1, 1, 1, 1], padding='SAME') elif b_idx == 4: current = tf.nn.max_pool(current, ksize=[1, ks, ks, 1], strides=[1, 1, 1, 1], padding='SAME') current = tf.nn.avg_pool(current, ksize=[1, ks, ks, 1], strides=[1, 1, 1, 1], padding='SAME') elif b_idx <= 6: current = tf.nn.dropout(current, keep_prob=keep_prob) # Classification layer; no ReLU. # w = self.variables[v_idx * 2] w = create_variable(name='w', shape=[1, 1, 1024, n_classes]) # b = self.variables[v_idx * 2 + 1] b = create_bias_variable(name='b', shape=[n_classes]) conv = tf.nn.conv2d(current, w, strides=[1, 1, 1, 1], padding='SAME') current = tf.nn.bias_add(conv, b) return current def prepare_label(self, input_batch, new_size): """Resize masks and perform one-hot encoding. Args: input_batch: input tensor of shape [batch_size H W 1]. new_size: a tensor with new height and width. Returns: Outputs a tensor of shape [batch_size h w 18] with last dimension comprised of 0's and 1's only. """ with tf.name_scope('label_encode'): # As labels are integer numbers, need to use NN interp. input_batch = tf.image.resize_nearest_neighbor( input_batch, new_size) # Reducing the channel dimension. input_batch = tf.squeeze(input_batch, squeeze_dims=[3]) input_batch = tf.one_hot(input_batch, depth=n_classes) return input_batch def preds(self, input_batch): """Create the network and run inference on the input batch. Args: input_batch: batch of pre-processed images. Returns: Argmax over the predictions of the network of the same shape as the input. """ raw_output = self._create_network( tf.cast(input_batch, tf.float32), keep_prob=tf.constant(1.0)) raw_output = tf.image.resize_bilinear( raw_output, tf.shape(input_batch)[1:3, ]) raw_output = tf.argmax(raw_output, dimension=3) raw_output = tf.expand_dims(raw_output, dim=3) # Create 4D-tensor. return tf.cast(raw_output, tf.uint8) def loss(self, img_batch, label_batch): """Create the network, run inference on the input batch and compute loss. Args: input_batch: batch of pre-processed images. Returns: Pixel-wise softmax loss. """ raw_output = self._create_network( tf.cast(img_batch, tf.float32), keep_prob=tf.constant(0.5)) prediction = tf.reshape(raw_output, [-1, n_classes]) # Need to resize labels and convert using one-hot encoding. label_batch = self.prepare_label( label_batch, tf.stack(raw_output.get_shape()[1:3])) gt = tf.reshape(label_batch, [-1, n_classes]) # Pixel-wise softmax loss. loss = tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=gt) reduced_loss = tf.reduce_mean(loss) return reduced_loss ``` 按理说载入模型应该没有问题,可是不知道为什么结果却不一样? 图片:![图片说明](https://img-ask.csdn.net/upload/201911/15/1573810836_83106.jpg) ![图片说明](https://img-ask.csdn.net/upload/201911/15/1573810850_924663.png) 预测的结果: ![图片说明](https://img-ask.csdn.net/upload/201911/15/1573810884_985680.png) ![图片说明](https://img-ask.csdn.net/upload/201911/15/1573810904_577649.png) 两次结果不一样,与保存的模型算出来的结果也不一样。 我用的是GitHub上这个人的代码: https://github.com/minar09/DeepLab-LFOV-TensorFlow 急急急,请问有大神知道吗???
用tensorflow做机器翻译时训练代码有问题
``` # -*- coding:UTF-8 -*- import tensorflow as tf src_path = 'D:/Python37/untitled1/train.tags.en-zh.en.deletehtml' trg_path = 'D:/Python37/untitled1/train.tags.en-zh.zh.deletehtml' SRC_TRAIN_DATA = 'D:/Python37/untitled1/train.tags.en-zh.en.deletehtml.segment' # 源语言输入文件 TRG_TRAIN_DATA = 'D:/Python37/untitled1/train.tags.en-zh.zh.deletehtml.segment' # 目标语言输入文件 CHECKPOINT_PATH = './model/seq2seq_ckpt' # checkpoint保存路径 HIDDEN_SIZE = 1024 # LSTM的隐藏层规模 NUM_LAYERS = 2 # 深层循环神经网络中LSTM结构的层数 SRC_VOCAB_SIZE = 10000 # 源语言词汇表大小 TRG_VOCAB_SIZE = 4000 # 目标语言词汇表大小 BATCH_SIZE = 100 # 训练数据batch的大小 NUM_EPOCH = 5 # 使用训练数据的轮数 KEEP_PROB = 0.8 # 节点不被dropout的概率 MAX_GRAD_NORM = 5 # 用于控制梯度膨胀的梯度大小上限 SHARE_EMB_AND_SOFTMAX = True # 在softmax层和词向量层之间共享参数 MAX_LEN = 50 # 限定句子的最大单词数量 SOS_ID = 1 # 目标语言词汇表中<sos>的ID """ function: 数据batching,产生最后输入数据格式 Parameters: file_path-数据路径 Returns: dataset- 每个句子-对应的长度组成的TextLineDataset类的数据集对应的张量 """ def MakeDataset(file_path): dataset = tf.data.TextLineDataset(file_path) # map(function, sequence[, sequence, ...]) -> list # 通过定义可以看到,这个函数的第一个参数是一个函数,剩下的参数是一个或多个序列,返回值是一个集合。 # function可以理解为是一个一对一或多对一函数,map的作用是以参数序列中的每一个元素调用function函数,返回包含每次function函数返回值的list。 # lambda argument_list: expression # 其中lambda是Python预留的关键字,argument_list和expression由用户自定义 # argument_list参数列表, expression 为函数表达式 # 根据空格将单词编号切分开并放入一个一维向量 dataset = dataset.map(lambda string: tf.string_split([string]).values) # 将字符串形式的单词编号转化为整数 dataset = dataset.map(lambda string: tf.string_to_number(string, tf.int32)) # 统计每个句子的单词数量,并与句子内容一起放入Dataset dataset = dataset.map(lambda x: (x, tf.size(x))) return dataset """ function: 从源语言文件src_path和目标语言文件trg_path中分别读取数据,并进行填充和batching操作 Parameters: src_path-源语言,即被翻译的语言,英语. trg_path-目标语言,翻译之后的语言,汉语. batch_size-batch的大小 Returns: dataset- 每个句子-对应的长度 组成的TextLineDataset类的数据集 """ def MakeSrcTrgDataset(src_path, trg_path, batch_size): # 首先分别读取源语言数据和目标语言数据 src_data = MakeDataset(src_path) trg_data = MakeDataset(trg_path) # 通过zip操作将两个Dataset合并为一个Dataset,现在每个Dataset中每一项数据ds由4个张量组成 # ds[0][0]是源句子 # ds[0][1]是源句子长度 # ds[1][0]是目标句子 # ds[1][1]是目标句子长度 #https://blog.csdn.net/qq_32458499/article/details/78856530这篇博客看一下可以细致了解一下Dataset这个库,以及.map和.zip的用法 dataset = tf.data.Dataset.zip((src_data, trg_data)) # 删除内容为空(只包含<eos>)的句子和长度过长的句子 def FilterLength(src_tuple, trg_tuple): ((src_input, src_len), (trg_label, trg_len)) = (src_tuple, trg_tuple) # tf.logical_and 相当于集合中的and做法,后面两个都为true最终结果才会为true,否则为false # tf.greater Returns the truth value of (x > y),所以以下所说的是句子长度必须得大于一也就是不能为空的句子 # tf.less_equal Returns the truth value of (x <= y),所以所说的是长度要小于最长长度 src_len_ok = tf.logical_and(tf.greater(src_len, 1), tf.less_equal(src_len, MAX_LEN)) trg_len_ok = tf.logical_and(tf.greater(trg_len, 1), tf.less_equal(trg_len, MAX_LEN)) return tf.logical_and(src_len_ok, trg_len_ok) #两个都满足才返回true # filter接收一个函数Func并将该函数作用于dataset的每个元素,根据返回值True或False保留或丢弃该元素,True保留该元素,False丢弃该元素 # 最后得到的就是去掉空句子和过长的句子的数据集 dataset = dataset.filter(FilterLength) # 解码器需要两种格式的目标句子: # 1.解码器的输入(trg_input), 形式如同'<sos> X Y Z' # 2.解码器的目标输出(trg_label), 形式如同'X Y Z <eos>' # 上面从文件中读到的目标句子是'X Y Z <eos>'的形式,我们需要从中生成'<sos> X Y Z'形式并加入到Dataset # 编码器只有输入,没有输出,而解码器有输入也有输出,输入为<sos>+(除去最后一位eos的label列表) # 例如train.en最后都为2,id为2就是eos def MakeTrgInput(src_tuple, trg_tuple): ((src_input, src_len), (trg_label, trg_len)) = (src_tuple, trg_tuple) # tf.concat用法 https://blog.csdn.net/qq_33431368/article/details/79429295 trg_input = tf.concat([[SOS_ID], trg_label[:-1]], axis=0) return ((src_input, src_len), (trg_input, trg_label, trg_len)) dataset = dataset.map(MakeTrgInput) # 随机打乱训练数据 dataset = dataset.shuffle(10000) # 规定填充后的输出的数据维度 padded_shapes = ( (tf.TensorShape([None]), # 源句子是长度未知的向量 tf.TensorShape([])), # 源句子长度是单个数字 (tf.TensorShape([None]), # 目标句子(解码器输入)是长度未知的向量 tf.TensorShape([None]), # 目标句子(解码器目标输出)是长度未知的向量 tf.TensorShape([])) # 目标句子长度(输出)是单个数字 ) # 调用padded_batch方法进行padding 和 batching操作 batched_dataset = dataset.padded_batch(batch_size, padded_shapes) return batched_dataset """ function: seq2seq模型 Parameters: Returns: """ class NMTModel(object): """ function: 模型初始化 Parameters: Returns: """ def __init__(self): # 定义编码器和解码器所使用的LSTM结构 self.enc_cell = tf.nn.rnn_cell.MultiRNNCell( [tf.nn.rnn_cell.LSTMCell(HIDDEN_SIZE) for _ in range(NUM_LAYERS)]) self.dec_cell = tf.nn.rnn_cell.MultiRNNCell( [tf.nn.rnn_cell.LSTMCell(HIDDEN_SIZE) for _ in range(NUM_LAYERS)]) # 为源语言和目标语言分别定义词向量 self.src_embedding = tf.get_variable('src_emb', [SRC_VOCAB_SIZE, HIDDEN_SIZE]) self.trg_embedding = tf.get_variable('trg_emb', [TRG_VOCAB_SIZE, HIDDEN_SIZE]) # 定义softmax层的变量 if SHARE_EMB_AND_SOFTMAX: self.softmax_weight = tf.transpose(self.trg_embedding) else: self.softmax_weight = tf.get_variable('weight', [HIDDEN_SIZE, TRG_VOCAB_SIZE]) self.softmax_bias = tf.get_variable('softmax_loss', [TRG_VOCAB_SIZE]) """ function: 在forward函数中定义模型的前向计算图 Parameters:   MakeSrcTrgDataset函数产生的五种张量如下(全部为张量) src_input: 编码器输入(源数据) src_size : 输入大小 trg_input:解码器输入(目标数据) trg_label:解码器输出(目标数据) trg_size: 输出大小 Returns: """ def forward(self, src_input, src_size, trg_input, trg_label, trg_size): batch_size = tf.shape(src_input)[0] # 将输入和输出单词转为词向量(rnn中输入数据都要转换成词向量) # 相当于input中的每个id对应的embedding中的向量转换 src_emb = tf.nn.embedding_lookup(self.src_embedding, src_input) trg_emb = tf.nn.embedding_lookup(self.trg_embedding, trg_input) # 在词向量上进行dropout src_emb = tf.nn.dropout(src_emb, KEEP_PROB) trg_emb = tf.nn.dropout(trg_emb, KEEP_PROB) # 使用dynamic_rnn构造编码器 # 编码器读取源句子每个位置的词向量,输出最后一步的隐藏状态enc_state # 因为编码器是一个双层LSTM,因此enc_state是一个包含两个LSTMStateTuple类的tuple, # 每个LSTMStateTuple对应编码器中一层的状态 # enc_outputs是顶层LSTM在每一步的输出,它的维度是[batch_size, max_time, HIDDEN_SIZE] # seq2seq模型中不需要用到enc_outputs,而attention模型会用到它 with tf.variable_scope('encoder'): enc_outputs, enc_state = tf.nn.dynamic_rnn(self.enc_cell, src_emb, src_size, dtype=tf.float32) # 使用dynamic_rnn构造解码器 # 解码器读取目标句子每个位置的词向量,输出的dec_outputs为每一步顶层LSTM的输出 # dec_outputs的维度是[batch_size, max_time, HIDDEN_SIZE] # initial_state=enc_state表示用编码器的输出来初始化第一步的隐藏状态 # 编码器最后编码结束最后的状态为解码器初始化的状态 with tf.variable_scope('decoder'): dec_outputs, _ = tf.nn.dynamic_rnn(self.dec_cell, trg_emb, trg_size, initial_state=enc_state) # 计算解码器每一步的log perplexity # 输出重新转换成shape为[,HIDDEN_SIZE] output = tf.reshape(dec_outputs, [-1, HIDDEN_SIZE]) # 计算解码器每一步的softmax概率值 logits = tf.matmul(output, self.softmax_weight) + self.softmax_bias # 交叉熵损失函数,算loss loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.reshape(trg_label, [-1]), logits=logits) # 在计算平均损失时,需要将填充位置的权重设置为0,以避免无效位置的预测干扰模型的训练 label_weights = tf.sequence_mask(trg_size, maxlen=tf.shape(trg_label)[1], dtype=tf.float32) label_weights = tf.reshape(label_weights, [-1]) cost = tf.reduce_sum(loss * label_weights) cost_per_token = cost / tf.reduce_sum(label_weights) # 定义反向传播操作 trainable_variables = tf.trainable_variables() # 控制梯度大小,定义优化方法和训练步骤 # 算出每个需要更新的值的梯度,并对其进行控制 grads = tf.gradients(cost / tf.to_float(batch_size), trainable_variables) grads, _ = tf.clip_by_global_norm(grads, MAX_GRAD_NORM) # 利用梯度下降优化算法进行优化.学习率为1.0 optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0) # 相当于minimize的第二步,正常来讲所得到的list[grads,vars]由compute_gradients得到,返回的是执行对应变量的更新梯度操作的op train_op = optimizer.apply_gradients(zip(grads, trainable_variables)) return cost_per_token, train_op """ function: 使用给定的模型model上训练一个epoch,并返回全局步数,每训练200步便保存一个checkpoint Parameters: session : 会议 cost_op : 计算loss的操作op train_op: 训练的操作op saver:  保存model的类 step:   训练步数 Returns: """ def run_epoch(session, cost_op, train_op, saver, step): # 训练一个epoch # 重复训练步骤直至遍历完Dataset中所有数据 while True: try: # 运行train_op并计算cost_op的结果也就是损失值,训练数据在main()函数中以Dataset方式提供 cost, _ = session.run([cost_op, train_op]) # 步数为10的倍数进行打印 if step % 10 == 0: print('After %d steps, per token cost is %.3f' % (step, cost)) # 每200步保存一个checkpoint if step % 200 == 0: saver.save(session, CHECKPOINT_PATH, global_step=step) step += 1 except tf.errors.OutOfRangeError: break return step """ function: 主函数 Parameters: Returns: """ def main(): # 定义初始化函数 initializer = tf.random_uniform_initializer(-0.05, 0.05) # 定义训练用的循环神经网络模型 with tf.variable_scope('nmt_model', reuse=None, initializer=initializer): train_model = NMTModel() # 定义输入数据 data = MakeSrcTrgDataset(SRC_TRAIN_DATA, TRG_TRAIN_DATA, BATCH_SIZE) iterator = data.make_initializable_iterator() (src, src_size), (trg_input, trg_label, trg_size) = iterator.get_next() # 定义前向计算图,输入数据以张量形式提供给forward函数 cost_op, train_op = train_model.forward(src, src_size, trg_input, trg_label, trg_size) # 训练模型 # 保存模型 saver = tf.train.Saver() step = 0 with tf.Session() as sess: # 初始化全部变量 tf.global_variables_initializer().run() # 进行NUM_EPOCH轮数 for i in range(NUM_EPOCH): print('In iteration: %d' % (i + 1)) sess.run(iterator.initializer) step = run_epoch(sess, cost_op, train_op, saver, step) if __name__ == '__main__': main() ``` 问题如下,不知道怎么解决,谢谢! Traceback (most recent call last): File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: StringToNumberOp could not correctly convert string: This [[{{node StringToNumber}}]] [[{{node IteratorGetNext}}]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:/Python37/untitled1/train_model.py", line 277, in <module> main() File "D:/Python37/untitled1/train_model.py", line 273, in main step = run_epoch(sess, cost_op, train_op, saver, step) File "D:/Python37/untitled1/train_model.py", line 231, in run_epoch cost, _ = session.run([cost_op, train_op]) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run run_metadata) File "D:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: StringToNumberOp could not correctly convert string: This [[{{node StringToNumber}}]] [[node IteratorGetNext (defined at D:/Python37/untitled1/train_model.py:259) ]]
spark读取avro序列化的parquet时报错:Illegal Parquet type: FIXED_LEN_BYTE_ARRAY
avro格式定义如下图:![图片说明](https://img-ask.csdn.net/upload/202002/14/1581611055_583617.png) 然后spark正常读取生成的parquet则报错:Illegal Parquet type: FIXED_LEN_BYTE_ARRAY。问怎么读取parquet(不一定要用spark)?详细错误如下: org.apache.spark.sql.AnalysisException: Illegal Parquet type: FIXED_LEN_BYTE_ARRAY; at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:107) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:175) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:89) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:71) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:65) at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:62) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:664) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:664) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:621) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:801) at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
Pizza Pricing 价格的问题
Problem Description Pizza has always been a staple on college campuses. After the downturn in the economy, it is more important than ever to get the best deal, namely the lowest cost per square inch. Consider, for example, the following menu for a store selling circular pizzas of varying diameter and price: One could actually compute the costs per square inch, which would be approximately 10.2¢, 7.6¢, and 7.1¢ respectively, so the 12-inch pizza is the best value. However, if the 10-inch had been sold for $5, it would have been the best value, at approximately 6.4¢ per square inch. Your task is to analyze a menu and to report the diameter of the pizza that is the best value. Note that no two pizzas on a menu will have the same diameter or the same inherent cost per square inch. Input The input contains a series of one or more menus. Each menu starts with the number of options N, 1 ≤ N ≤ 10, followed by N lines, each containing two integers respectively designating a pizza's diameter D (in inches) and price P (in dollars), with 1 ≤ D ≤ 36 and 1 ≤ P ≤ 100. The end of the input will be designated with a line containing the number 0. Output For each menu, print a line identifying the menu number and the diameter D of the pizza with the best value, using the format shown below. Sample Input 3 5 2 10 6 12 8 3 5 2 10 5 12 8 4 1 1 24 33 13 11 6 11 0 Sample Output Menu 1: 12 Menu 2: 10 Menu 3: 24
Sunday Drive 驾驶的问题
Problem Description After wracking your brains at a programing contest on Saturday, you’d like to relax by taking a leisurely Sunday drive. But, gasoline is so expensive nowadays! Maybe, by creatively changing lanes, you can minimize the distance you travel and save some money! You will be given a description of several sections of a highway. All sections will have the same number of lanes. Think of your car as a point mass, moving down the center of the lane. Each lane will be 10 feet wide. There are two kinds of highway sections: curved and straight. You can only change lanes on straight sections, and it takes a minimum of 100 feet of the straight section to move over one lane. You can take longer than that, of course, if you choose. All curve sections will make 90 degree turns. You cannot change lanes on a curve section. In addition, you must be driving along the exact middle of a lane during a turn. So during a turn your position will be 5 feet, or 15 feet, or 25 feet from the edge, etc. Given a description of a highway, compute the minimum total distance required travel the entire highway, including curves and lane changes. You can start, and end, in any lane you choose. Assume that your car is a point mass in the center of the lane. The highway may cross over/under itself, but the changes in elevation are miniscule, so you shouldn’t worry about their impact on your distance traveled. In order to be used to cross 2 lanes, this straight section must be at least 200 feet long. Input There will be several test cases in the input. Each test case will begin with two integers N M Where N (1 ≤ N ≤ 1,000) is the number of segments, and M (2 ≤ M ≤ 10) is the number of lanes. On each of the next N lines will be a description of a segment, consisting of a letter and a number, with a single space between them: T K The letter T is one of S, L, or R (always capital). This indicates the type of the section: a straight section (S), a left curve (L) or a right curve (R). If the section is a straight section, then the number K (10 ≤ K ≤ 10,000) is simply its length, in feet. If the section is a right or left curve, then the number K (10 ≤ K ≤ 10,000) is the radius of the inside edge of the highway, again in feet. There will never be consecutive straight sections in the input, but multiple consecutive turns are possible. The input will end with a line with two 0s. Output For each test case, print a single number on its own line, indicating the minimum distance (in feet) required to drive the entire highway. The number should be printed with exactly two decimal places, rounded. Output no extra spaces, and do not separate answers with blank lines. Sample Input 3 3 R 100 S 1000 L 100 9 5 S 2500 L 500 S 2000 L 500 S 5000 L 500 S 2000 L 500 S 2500 5 4 L 100 L 100 L 100 L 100 L 100 0 0 Sample Output 1330.07 17173.01 824.67
急求:OPENCV中使用ORB提取的特征描述子descriptors和关键点Keypoints的size都为0
我使用OPENCV中的ORB提取一系列图片的特征描述子descriptors和关键点Keypoints,但是提取完后发现**Keypoints.size( )=0,descriptors.size() =[0X0]**,求问这是什么原因,该怎么解决? ``` Ptr<ORB> orb = ORB::create(); vector<KeyPoint> Keypoints; Mat descriptors; Mat src = imread(files1,0); resize(src,src,Size(48,48)); orb->detect(src, Keypoints); orb->compute(src, Keypoints, descriptors); cout << "key_size= " << Keypoints.size() <<endl; cout << "orb_size= " << descriptors.size() <<endl; ``` 运行截图 ![图片说明](https://img-ask.csdn.net/upload/202002/06/1580961368_120874.png)
Robot Navigation 罗伯特的导航
Problem Description A robot has been sent to explore a remote planet. To specify a path the robot should take, a program is sent each day. The program consists of a sequence of the following commands: FORWARD X: move forward by X units. TURN LEFT: turn left (in place) by 90 degrees. TURN RIGHT: turn right (in place) by 90 degrees. The robot also has sensor units which allow it to obtain a map of its surrounding area. The map is represented as a grid. Some grid points contain hazards (e.g. craters) and the program must avoid these points or risk losing the robot. Naturally, if the initial location of the robot, the direction it is facing, and its destination position are known, it is best to send the shortest program (one consisting of the fewest commands) to move the robot to its destination (we do not care which direction it faces at the destination). You are more interested in knowing the number of different shortest programs that can move the robot to its destination. However, the number of shortest programs can be very large, so you are satisfied to compute the number as a remainder modulo 1,000,000. Input There will be several test cases in the input. Each test case will begin with a line with two integers N M Where N is the number of rows in the grid, and M is the number of columns in the grid (2 ≤ N, M ≤ 100). The next N lines of input will have M characters each. The characters will be one of the following: ‘.’ Indicating a navigable grid point. ‘*’ Indicating a crater (i.e. a non-navigable grid point). ‘X’ Indicating the target grid point. There will be exactly one ‘X’. ‘N’, ‘E’, ‘S’, or ‘W’ Indicating the starting point and initial heading of the robot. There will be exactly one of these. Note that the directions mirror compass directions on a map: N is North (toward the top of the grid), E is East (toward the right of the grid), S is South (toward the bottom of the grid) and W is West (toward the left of the grid). There will be no spaces and no other characters in the description of the map. The input will end with a line with two 0s. Output For each test case, output two integers on a single line, with a single space between them. The first is the length of a shortest possible program to navigate the robot from its starting point to the target, and the second is the number of different programs of that length which will get the robot to the target (modulo 1,000,000). If there is no path from the robot to the target, output two zeros separated by a single space. Output no extra spaces, and do not separate answers with blank lines. Sample Input 5 6 *....X .....* .....* .....* N....* 6 5 ....X .**** .**** .**** .**** N**** 3 3 .E. *** .X. 0 0 Sample Output 6 4 3 1 0 0
Pizza Pricing 皮萨的定价
Problem Description Pizza has always been a staple on college campuses. After the downturn in the economy, it is more important than ever to get the best deal, namely the lowest cost per square inch. Consider, for example, the following menu for a store selling circular pizzas of varying diameter and price: One could actually compute the costs per square inch, which would be approximately 10.2¢, 7.6¢, and 7.1¢ respectively, so the 12-inch pizza is the best value. However, if the 10-inch had been sold for $5, it would have been the best value, at approximately 6.4¢ per square inch. Your task is to analyze a menu and to report the diameter of the pizza that is the best value. Note that no two pizzas on a menu will have the same diameter or the same inherent cost per square inch. Input The input contains a series of one or more menus. Each menu starts with the number of options N, 1 ≤ N ≤ 10, followed by N lines, each containing two integers respectively designating a pizza's diameter D (in inches) and price P (in dollars), with 1 ≤ D ≤ 36 and 1 ≤ P ≤ 100. The end of the input will be designated with a line containing the number 0. Output For each menu, print a line identifying the menu number and the diameter D of the pizza with the best value, using the format shown below. Sample Input 3 5 2 10 6 12 8 3 5 2 10 5 12 8 4 1 1 24 33 13 11 6 11 0 Sample Output Menu 1: 12 Menu 2: 10 Menu 3: 24
Sunday Drive 的问题
Problem Description After wracking your brains at a programing contest on Saturday, you’d like to relax by taking a leisurely Sunday drive. But, gasoline is so expensive nowadays! Maybe, by creatively changing lanes, you can minimize the distance you travel and save some money! You will be given a description of several sections of a highway. All sections will have the same number of lanes. Think of your car as a point mass, moving down the center of the lane. Each lane will be 10 feet wide. There are two kinds of highway sections: curved and straight. You can only change lanes on straight sections, and it takes a minimum of 100 feet of the straight section to move over one lane. You can take longer than that, of course, if you choose. All curve sections will make 90 degree turns. You cannot change lanes on a curve section. In addition, you must be driving along the exact middle of a lane during a turn. So during a turn your position will be 5 feet, or 15 feet, or 25 feet from the edge, etc. Given a description of a highway, compute the minimum total distance required travel the entire highway, including curves and lane changes. You can start, and end, in any lane you choose. Assume that your car is a point mass in the center of the lane. The highway may cross over/under itself, but the changes in elevation are miniscule, so you shouldn’t worry about their impact on your distance traveled. In order to be used to cross 2 lanes, this straight section must be at least 200 feet long. Input There will be several test cases in the input. Each test case will begin with two integers N M Where N (1 ≤ N ≤ 1,000) is the number of segments, and M (2 ≤ M ≤ 10) is the number of lanes. On each of the next N lines will be a description of a segment, consisting of a letter and a number, with a single space between them: T K The letter T is one of S, L, or R (always capital). This indicates the type of the section: a straight section (S), a left curve (L) or a right curve (R). If the section is a straight section, then the number K (10 ≤ K ≤ 10,000) is simply its length, in feet. If the section is a right or left curve, then the number K (10 ≤ K ≤ 10,000) is the radius of the inside edge of the highway, again in feet. There will never be consecutive straight sections in the input, but multiple consecutive turns are possible. The input will end with a line with two 0s. Output For each test case, print a single number on its own line, indicating the minimum distance (in feet) required to drive the entire highway. The number should be printed with exactly two decimal places, rounded. Output no extra spaces, and do not separate answers with blank lines. Sample Input 3 3 R 100 S 1000 L 100 9 5 S 2500 L 500 S 2000 L 500 S 5000 L 500 S 2000 L 500 S 2500 5 4 L 100 L 100 L 100 L 100 L 100 0 0 Sample Output 1330.07 17173.01 824.67
修改的SSD—Tensorflow 版本在训练的时候遇到loss输入维度不一致
目前在学习目标检测识别的方向。 自己参考了一些论文 对原版的SSD进行了一些改动工作 前面的网络模型部分已经修改完成且不报错。 但是在进行训练操作的时候会出现 ’ValueError: Dimension 0 in both shapes must be equal, but are 233920 and 251392. Shapes are [233920] and [251392]. for 'ssd_losses/Select' (op: 'Select') with input shapes: [251392], [233920], [251392]. ‘ ‘两个形状中的尺寸0必须相等,但分别为233920和251392。形状有[233920]和[251392]。对于输入形状为[251392]、[233920]、[251392]的''ssd_losses/Select' (op: 'Select') ![图片说明](https://img-ask.csdn.net/upload/201904/06/1554539638_631515.png) ![图片说明](https://img-ask.csdn.net/upload/201904/06/1554539651_430990.png) # SSD loss function. # =========================================================================== # def ssd_losses(logits, localisations, gclasses, glocalisations, gscores, match_threshold=0.5, negative_ratio=3., alpha=1., label_smoothing=0., device='/cpu:0', scope=None): with tf.name_scope(scope, 'ssd_losses'): lshape = tfe.get_shape(logits[0], 5) num_classes = lshape[-1] batch_size = lshape[0] # Flatten out all vectors! flogits = [] fgclasses = [] fgscores = [] flocalisations = [] fglocalisations = [] for i in range(len(logits)): flogits.append(tf.reshape(logits[i], [-1, num_classes])) fgclasses.append(tf.reshape(gclasses[i], [-1])) fgscores.append(tf.reshape(gscores[i], [-1])) flocalisations.append(tf.reshape(localisations[i], [-1, 4])) fglocalisations.append(tf.reshape(glocalisations[i], [-1, 4])) # And concat the crap! logits = tf.concat(flogits, axis=0) gclasses = tf.concat(fgclasses, axis=0) gscores = tf.concat(fgscores, axis=0) localisations = tf.concat(flocalisations, axis=0) glocalisations = tf.concat(fglocalisations, axis=0) dtype = logits.dtype # Compute positive matching mask... pmask = gscores > match_threshold fpmask = tf.cast(pmask, dtype) n_positives = tf.reduce_sum(fpmask) # Hard negative mining... no_classes = tf.cast(pmask, tf.int32) predictions = slim.softmax(logits) nmask = tf.logical_and(tf.logical_not(pmask), gscores > -0.5) fnmask = tf.cast(nmask, dtype) nvalues = tf.where(nmask, predictions[:, 0], 1. - fnmask) nvalues_flat = tf.reshape(nvalues, [-1]) # Number of negative entries to select. max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32) n_neg = tf.cast(negative_ratio * n_positives, tf.int32) + batch_size n_neg = tf.minimum(n_neg, max_neg_entries) val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg) max_hard_pred = -val[-1] # Final negative mask. nmask = tf.logical_and(nmask, nvalues < max_hard_pred) fnmask = tf.cast(nmask, dtype) # Add cross-entropy loss. with tf.name_scope('cross_entropy_pos'): loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=gclasses) loss = tf.div(tf.reduce_sum(loss * fpmask), batch_size, name='value') tf.losses.add_loss(loss) with tf.name_scope('cross_entropy_neg'): loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=no_classes) loss = tf.div(tf.reduce_sum(loss * fnmask), batch_size, name='value') tf.losses.add_loss(loss) # Add localization loss: smooth L1, L2, ... with tf.name_scope('localization'): # Weights Tensor: positive mask + random negative. weights = tf.expand_dims(alpha * fpmask, axis=-1) loss = custom_layers.abs_smooth(localisations - glocalisations) loss = tf.div(tf.reduce_sum(loss * weights), batch_size, name='value') tf.losses.add_loss(loss) ``` ``` 研究了一段时间的源码 (因为只是SSD-Tensorflow-Master中的ssd_vgg_300.py中定义网络结构的那部分做了修改 ,loss函数代码部分并没有进行改动)所以没所到错误所在,网上也找不到相关的解决方案。 希望大神能够帮忙解答 感激不尽~
终于明白阿里百度这样的大公司,为什么面试经常拿ThreadLocal考验求职者了
点击上面↑「爱开发」关注我们每晚10点,捕获技术思考和创业资源洞察什么是ThreadLocalThreadLocal是一个本地线程副本变量工具类,各个线程都拥有一份线程私有的数
程序员必须掌握的核心算法有哪些?
由于我之前一直强调数据结构以及算法学习的重要性,所以就有一些读者经常问我,数据结构与算法应该要学习到哪个程度呢?,说实话,这个问题我不知道要怎么回答你,主要取决于你想学习到哪些程度,不过针对这个问题,我稍微总结一下我学过的算法知识点,以及我觉得值得学习的算法。这些算法与数据结构的学习大多数是零散的,并没有一本把他们全部覆盖的书籍。下面是我觉得值得学习的一些算法以及数据结构,当然,我也会整理一些看过
Linux(服务器编程):15---两种高效的事件处理模式(reactor模式、proactor模式)
前言 同步I/O模型通常用于实现Reactor模式 异步I/O模型则用于实现Proactor模式 最后我们会使用同步I/O方式模拟出Proactor模式 一、Reactor模式 Reactor模式特点 它要求主线程(I/O处理单元)只负责监听文件描述符上是否有事件发生,有的话就立即将时间通知工作线程(逻辑单元)。除此之外,主线程不做任何其他实质性的工作 读写数据,接受新的连接,以及处...
阿里面试官问我:如何设计秒杀系统?我的回答让他比起大拇指
你知道的越多,你不知道的越多 点赞再看,养成习惯 GitHub上已经开源 https://github.com/JavaFamily 有一线大厂面试点脑图和个人联系方式,欢迎Star和指教 前言 Redis在互联网技术存储方面使用如此广泛,几乎所有的后端技术面试官都要在Redis的使用和原理方面对小伙伴们进行360°的刁难。 作为一个在互联网公司面一次拿一次Offer的面霸,打败了...
五年程序员记流水账式的自白。
不知觉已中码龄已突破五年,一路走来从起初铁憨憨到现在的十九线程序员,一路成长,虽然不能成为高工,但是也能挡下一面,从15年很火的android开始入坑,走过java、.Net、QT,目前仍处于android和.net交替开发中。 毕业到现在一共就职过两家公司,目前是第二家,公司算是半个创业公司,所以基本上都会身兼多职。比如不光要写代码,还要写软著、软著评测、线上线下客户对接需求收集...
C语言魔塔游戏
很早就很想写这个,今天终于写完了。 游戏截图: 编译环境: VS2017 游戏需要一些图片,如果有想要的或者对游戏有什么看法的可以加我的QQ 2985486630 讨论,如果暂时没有回应,可以在博客下方留言,到时候我会看到。 下面我来介绍一下游戏的主要功能和实现方式 首先是玩家的定义,使用结构体,这个名字是可以自己改变的 struct gamerole { char n
一文详尽系列之模型评估指标
点击上方“Datawhale”,选择“星标”公众号第一时间获取价值内容在机器学习领域通常会根据实际的业务场景拟定相应的不同的业务指标,针对不同机器学习问题如回归、分类、排...
究竟你适不适合买Mac?
我清晰的记得,刚买的macbook pro回到家,开机后第一件事情,就是上了淘宝网,花了500元钱,找了一个上门维修电脑的师傅,上门给我装了一个windows系统。。。。。。 表砍我。。。 当时买mac的初衷,只是想要个固态硬盘的笔记本,用来运行一些复杂的扑克软件。而看了当时所有的SSD笔记本后,最终决定,还是买个好(xiong)看(da)的。 已经有好几个朋友问我mba怎么样了,所以今天尽量客观
程序员一般通过什么途径接私活?
二哥,你好,我想知道一般程序猿都如何接私活,我也想接,能告诉我一些方法吗? 上面是一个读者“烦不烦”问我的一个问题。其实不止是“烦不烦”,还有很多读者问过我类似这样的问题。 我接的私活不算多,挣到的钱也没有多少,加起来不到 20W。说实话,这个数目说出来我是有点心虚的,毕竟太少了,大家轻喷。但我想,恰好配得上“一般程序员”这个称号啊。毕竟苍蝇再小也是肉,我也算是有经验的人了。 唾弃接私活、做外...
压测学习总结(1)——高并发性能指标:QPS、TPS、RT、吞吐量详解
一、QPS,每秒查询 QPS:Queries Per Second意思是“每秒查询率”,是一台服务器每秒能够相应的查询次数,是对一个特定的查询服务器在规定时间内所处理流量多少的衡量标准。互联网中,作为域名系统服务器的机器的性能经常用每秒查询率来衡量。 二、TPS,每秒事务 TPS:是TransactionsPerSecond的缩写,也就是事务数/秒。它是软件测试结果的测量单位。一个事务是指一...
Python爬虫爬取淘宝,京东商品信息
小编是一个理科生,不善长说一些废话。简单介绍下原理然后直接上代码。 使用的工具(Python+pycharm2019.3+selenium+xpath+chromedriver)其中要使用pycharm也可以私聊我selenium是一个框架可以通过pip下载 pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple/ 
阿里程序员写了一个新手都写不出的低级bug,被骂惨了。
这种新手都不会范的错,居然被一个工作好几年的小伙子写出来,差点被当场开除了。
Java工作4年来应聘要16K最后没要,细节如下。。。
前奏: 今天2B哥和大家分享一位前几天面试的一位应聘者,工作4年26岁,统招本科。 以下就是他的简历和面试情况。 基本情况: 专业技能: 1、&nbsp;熟悉Sping了解SpringMVC、SpringBoot、Mybatis等框架、了解SpringCloud微服务 2、&nbsp;熟悉常用项目管理工具:SVN、GIT、MAVEN、Jenkins 3、&nbsp;熟悉Nginx、tomca
2020年,冯唐49岁:我给20、30岁IT职场年轻人的建议
点击“技术领导力”关注∆  每天早上8:30推送 作者| Mr.K   编辑| Emma 来源| 技术领导力(ID:jishulingdaoli) 前天的推文《冯唐:职场人35岁以后,方法论比经验重要》,收到了不少读者的反馈,觉得挺受启发。其实,冯唐写了不少关于职场方面的文章,都挺不错的。可惜大家只记住了“春风十里不如你”、“如何避免成为油腻腻的中年人”等不那么正经的文章。 本文整理了冯
程序员该看的几部电影
##1、骇客帝国(1999) 概念:在线/离线,递归,循环,矩阵等 剧情简介: 不久的将来,网络黑客尼奥对这个看似正常的现实世界产生了怀疑。 他结识了黑客崔妮蒂,并见到了黑客组织的首领墨菲斯。 墨菲斯告诉他,现实世界其实是由一个名叫“母体”的计算机人工智能系统控制,人们就像他们饲养的动物,没有自由和思想,而尼奥就是能够拯救人类的救世主。 可是,救赎之路从来都不会一帆风顺,到底哪里才是真实的世界?
Python绘图,圣诞树,花,爱心 | Turtle篇
每周每日,分享Python实战代码,入门资料,进阶资料,基础语法,爬虫,数据分析,web网站,机器学习,深度学习等等。 公众号回复【进群】沟通交流吧,QQ扫码进群学习吧 微信群 QQ群 1.画圣诞树 import turtle screen = turtle.Screen() screen.setup(800,600) circle = turtle.Turtle()...
作为一个程序员,CPU的这些硬核知识你必须会!
CPU对每个程序员来说,是个既熟悉又陌生的东西? 如果你只知道CPU是中央处理器的话,那可能对你并没有什么用,那么作为程序员的我们,必须要搞懂的就是CPU这家伙是如何运行的,尤其要搞懂它里面的寄存器是怎么一回事,因为这将让你从底层明白程序的运行机制。 随我一起,来好好认识下CPU这货吧 把CPU掰开来看 对于CPU来说,我们首先就要搞明白它是怎么回事,也就是它的内部构造,当然,CPU那么牛的一个东
还记得那个提速8倍的IDEA插件吗?VS Code版本也发布啦!!
去年,阿里云发布了本地 IDE 插件 Cloud Toolkit,仅 IntelliJ IDEA 一个平台,就有 15 万以上的开发者进行了下载,体验了一键部署带来的开发便利。时隔一年的今天,阿里云正式发布了 Visual Studio Code 版本,全面覆盖前端开发者,帮助前端实现一键打包部署,让开发提速 8 倍。 VSCode 版本的插件,目前能做到什么? 安装插件之后,开发者可以立即体验...
破14亿,Python分析我国存在哪些人口危机!
2020年1月17日,国家统计局发布了2019年国民经济报告,报告中指出我国人口突破14亿。 猪哥的朋友圈被14亿人口刷屏,但是很多人并没有看到我国复杂的人口问题:老龄化、男女比例失衡、生育率下降、人口红利下降等。 今天我们就来分析一下我们国家的人口数据吧! 一、背景 1.人口突破14亿 2020年1月17日,国家统计局发布了 2019年国民经济报告 ,报告中指出:年末中国大陆总人口(包括31个
2019年除夕夜的有感而发
天气:小雨(加小雪) 温度:3摄氏度 空气:严重污染(399) 风向:北风 风力:微风 现在是除夕夜晚上十点钟,再有两个小时就要新的一年了; 首先要说的是我没患病,至少现在是没有患病;但是心情确像患了病一样沉重; 现在这个时刻应该大部分家庭都在看春晚吧,或许一家人团团圆圆的坐在一起,或许因为某些特殊原因而不能团圆;但不管是身在何处,身处什么境地,我都想对每一个人说一句:新年快乐! 不知道csdn这...
听说想当黑客的都玩过这个Monyer游戏(1~14攻略)
第零关 进入传送门开始第0关(游戏链接) 请点击链接进入第1关: 连接在左边→ ←连接在右边 看不到啊。。。。(只能看到一堆大佬做完的留名,也能看到菜鸡的我,在后面~~) 直接fn+f12吧 &lt;span&gt;连接在左边→&lt;/span&gt; &lt;a href="first.php"&gt;&lt;/a&gt; &lt;span&gt;←连接在右边&lt;/span&gt; o...
在家远程办公效率低?那你一定要收好这个「在家办公」神器!
相信大家都已经收到国务院延长春节假期的消息,接下来,在家远程办公可能将会持续一段时间。 但是问题来了。远程办公不是人在电脑前就当坐班了,相反,对于沟通效率,文件协作,以及信息安全都有着极高的要求。有着非常多的挑战,比如: 1在异地互相不见面的会议上,如何提高沟通效率? 2文件之间的来往反馈如何做到及时性?如何保证信息安全? 3如何规划安排每天工作,以及如何进行成果验收? ......
作为一个程序员,内存和磁盘的这些事情,你不得不知道啊!!!
截止目前,我已经分享了如下几篇文章: 一个程序在计算机中是如何运行的?超级干货!!! 作为一个程序员,CPU的这些硬核知识你必须会! 作为一个程序员,内存的这些硬核知识你必须懂! 这些知识可以说是我们之前都不太重视的基础知识,可能大家在上大学的时候都学习过了,但是嘞,当时由于老师讲解的没那么有趣,又加上这些知识本身就比较枯燥,所以嘞,大家当初几乎等于没学。 再说啦,学习这些,也看不出来有什么用啊!
2020年的1月,我辞掉了我的第一份工作
其实,这篇文章,我应该早点写的,毕竟现在已经2月份了。不过一些其它原因,或者是我的惰性、还有一些迷茫的念头,让自己迟迟没有试着写一点东西,记录下,或者说是总结下自己前3年的工作上的经历、学习的过程。 我自己知道的,在写自己的博客方面,我的文笔很一般,非技术类的文章不想去写;另外我又是一个还比较热衷于技术的人,而平常复杂一点的东西,如果想写文章写的清楚点,是需要足够...
别低估自己的直觉,也别高估自己的智商
所有群全部吵翻天,朋友圈全部沦陷,公众号疯狂转发。这两周没怎么发原创,只发新闻,可能有人注意到了。我不是懒,是文章写了却没发,因为大家的关注力始终在这次的疫情上面,发了也没人看。当然,我...
这个世界上人真的分三六九等,你信吗?
偶然间,在知乎上看到一个问题 一时间,勾起了我深深的回忆。 以前在厂里打过两次工,做过家教,干过辅导班,做过中介。零下几度的晚上,贴过广告,满脸、满手地长冻疮。 再回首那段岁月,虽然苦,但让我学会了坚持和忍耐。让我明白了,在这个世界上,无论环境多么的恶劣,只要心存希望,星星之火,亦可燎原。 下文是原回答,希望能对你能有所启发。 如果我说,这个世界上人真的分三六九等,...
节后首个工作日,企业们集体开晨会让钉钉挂了
By 超神经场景描述:昨天 2 月 3 日,是大部分城市号召远程工作的第一天,全国有接近 2 亿人在家开始远程办公,钉钉上也有超过 1000 万家企业活跃起来。关键词:十一出行 人脸...
Java基础知识点梳理
Java基础知识点梳理 摘要: 虽然已经在实际工作中经常与java打交道,但是一直没系统地对java这门语言进行梳理和总结,掌握的知识也比较零散。恰好利用这段时间重新认识下java,并对一些常见的语法和知识点做个总结与回顾,一方面为了加深印象,方便后面查阅,一方面为了学好java打下基础。 Java简介 java语言于1995年正式推出,最开始被命名为Oak语言,由James Gosling(詹姆
2020年全新Java学习路线图,含配套视频,学完即为中级Java程序员!!
新的一年来临,突如其来的疫情打破了平静的生活! 在家的你是否很无聊,如果无聊就来学习吧! 世上只有一种投资只赚不赔,那就是学习!!! 传智播客于2020年升级了Java学习线路图,硬核升级,免费放送! 学完你就是中级程序员,能更快一步找到工作! 一、Java基础 JavaSE基础是Java中级程序员的起点,是帮助你从小白到懂得编程的必经之路。 在Java基础板块中有6个子模块的学
B 站上有哪些很好的学习资源?
哇说起B站,在小九眼里就是宝藏般的存在,放年假宅在家时一天刷6、7个小时不在话下,更别提今年的跨年晚会,我简直是跪着看完的!! 最早大家聚在在B站是为了追番,再后来我在上面刷欧美新歌和漂亮小姐姐的舞蹈视频,最近两年我和周围的朋友们已经把B站当作学习教室了,而且学习成本还免费,真是个励志的好平台ヽ(.◕ฺˇд ˇ◕ฺ;)ノ 下面我们就来盘点一下B站上优质的学习资源: 综合类 Oeasy: 综合
相关热词 c# id读写器 c#俄罗斯方块源码 c# linq原理 c# 装箱有什么用 c#集合 复制 c# 一个字符串分组 c++和c#哪个就业率高 c# 批量动态创建控件 c# 模块和程序集的区别 c# gmap 截图
立即提问

相似问题