weixin_39773447
weixin_39773447
2020-12-01 11:37

Batchnorm: "could not create a primitive descriptor iterator"

I am having troubles implementing a simple Benchmark to test the Batchnorm performance.

I have tried on several versions of mkl-dnn with more or less success. With the v0.16 Version the forward pass works but i can't get the backwards pass working. Currently I am testing with the latest master Version. (038c9c7e58311b414452b89b3f3bbc9566e8b7df)

I get "Error: could not create a primitive descriptor iterator". Digging deeper, the returned status of the iterator creation is "unimplemented".

I am pretty sure its a problem on my side, maybe you can help me spot the mistake?

Here is my Code:


std::shared_ptr<primitive> bn_fwd, bn_bwd;
        engine eng(engine::kind::cpu, 0);
        memory::data_type src_dt = memory::data_type::f32;

        using tag = memory::format_tag;


#ifdef FUSED_BN
        normalization_flags flags = normalization_flags::fuse_norm_relu;
#else
        normalization_flags flags = 0;
#endif

        auto tz_volume = [=](memory::dims tz_dims) {
            return std::accumulate(tz_dims.begin(), tz_dims.end(), 1,
                                   std::multiplies<int>());
        };
//desc/Init Memory
        //Set SRC Layer Dimension
        memory::dims src_layer_dims = {prob.minibatch, prob.oc,
                                       calc_out_dim(prob.h, prob.fh, prob.pad_h, prob.stride_h),
                                       calc_out_dim(prob.w, prob.fw, prob.pad_w, prob.stride_w)};
        printf("Example Dim: (%i,%i,%i,%i)\n", src_layer_dims[0], src_layer_dims[1], src_layer_dims[2], src_layer_dims[3]);
        //Example Dim: (4,32,79,341),(4,32,38,166)...

        std::vector<float> src_layer(tz_volume(src_layer_dims), 1.0f);
        memory::desc src_layer_md = { { src_layer_dims }, memory::data_type::f32, tag::nchw };
        auto src_memory = mkldnn::memory(src_layer_md, eng, src_layer.data());

        std::vector<float> dst_layer(tz_volume(src_layer_dims), 1.0f);
        memory::desc dst_layer_md = { { src_layer_dims }, memory::data_type::f32, tag::nchw };
        auto dst_memory = memory(dst_layer_md, eng, dst_layer.data());
//End Memory

        batch_normalization_forward::desc fwd_batch_norm_d = {prop_kind::forward_training, src_layer_md, (float)1E-7, flags};
        auto fwd_batch_norm_pd = batch_normalization_forward::primitive_desc(fwd_batch_norm_d, eng);

        auto mean_memory = memory(fwd_batch_norm_pd.mean_desc(), eng);
        auto variance_memory = memory(fwd_batch_norm_pd.variance_desc(), eng);

#ifdef FUSED_BN
        auto workspace_memory = memory(fwd_batch_norm_pd.workspace_desc(), eng);
#endif
        auto weights_memory = memory(fwd_batch_norm_pd.weights_desc(), eng);

        stream s(eng);

        bn_fwd.reset(new batch_normalization_forward(fwd_batch_norm_pd));

        std::unordered_map< int, memory > args_fwd = { { MKLDNN_ARG_SRC, src_memory},
                                                             { MKLDNN_ARG_DST, dst_memory},
#ifdef FUSED_BN
                { MKLDNN_ARG_WORKSPACE, workspace_memory},
#endif
                                                             { MKLDNN_ARG_MEAN, mean_memory},
                                                             { MKLDNN_ARG_VARIANCE, variance_memory},
                                                             { MKLDNN_ARG_WEIGHTS, weights_memory}};

        rand_fill(src_layer.data(), src_layer_md.get_size());

        //Warmup
        (*bn_fwd).execute(s, args_fwd);

        bench_result res_fwd = timeit(prob.iters,
                                   [&](){
                                       (*bn_fwd).execute(s, args_fwd);
                                   });
</float></float></int></primitive>

Environment

CPU and Flags:


Architecture:          x86_64
Core(s) per socket:    24
Socket(s):             2
Model name:            Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512

OS: Linux CentOS 7

Compiler: icpc (v19) Hash: 038c9c7e58311b414452b89b3f3bbc9566e8b7df

该提问来源于开源项目:oneapi-src/oneDNN

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

5条回答

  • weixin_39812039 weixin_39812039 5月前

    Hi ,

    Could you please share your code for the backward pass as well?

    I don't see any issues with the code you posted. It compiles and run fine with both -UFUSED_BN and -DFUSED_BN.

    点赞 评论 复制链接分享
  • weixin_39773447 weixin_39773447 5月前

    This is the whole section with the Backward pass included:

    
    std::shared_ptr<primitive> bn_fwd, bn_bwd;
            engine eng(engine::kind::cpu, 0);
            memory::data_type src_dt = memory::data_type::f32;
    
            using tag = memory::format_tag;
    
    
    #ifdef FUSED_BN
            normalization_flags flags = normalization_flags::fuse_norm_relu;
    #else
            normalization_flags flags = 0;
    #endif
    
            auto tz_volume = [=](memory::dims tz_dims) {
                return std::accumulate(tz_dims.begin(), tz_dims.end(), 1,
                                       std::multiplies<int>());
            };
    
            //Set SRC Layer Dimension
            memory::dims src_layer_dims = {prob.minibatch, prob.oc,
                                           calc_out_dim(prob.h, prob.fh, prob.pad_h, prob.stride_h),
                                           calc_out_dim(prob.w, prob.fw, prob.pad_w, prob.stride_w)};
            printf("Example Dim: (%i,%i,%i,%i)\n", src_layer_dims[0], src_layer_dims[1], src_layer_dims[2], src_layer_dims[3]);
            //Example Dim: (4,32,79,341),(4,32,38,166)...
    
            std::vector<float> src_layer(tz_volume(src_layer_dims), 1.0f);
            memory::desc src_layer_md = { { src_layer_dims }, memory::data_type::f32, tag::nchw };
            auto src_memory = mkldnn::memory(src_layer_md, eng, src_layer.data());
    
            std::vector<float> dst_layer(tz_volume(src_layer_dims), 1.0f);
            memory::desc dst_layer_md = { { src_layer_dims }, memory::data_type::f32, tag::nchw };
            auto dst_memory = memory(dst_layer_md, eng, dst_layer.data());
    
            batch_normalization_forward::desc fwd_batch_norm_d = {prop_kind::forward_training, src_layer_md, (float)1E-7, flags};
            auto fwd_batch_norm_pd = batch_normalization_forward::primitive_desc(fwd_batch_norm_d, eng);
    
            auto mean_memory = memory(fwd_batch_norm_pd.mean_desc(), eng);
            auto variance_memory = memory(fwd_batch_norm_pd.variance_desc(), eng);
    
    #ifdef FUSED_BN
            auto workspace_memory = memory(fwd_batch_norm_pd.workspace_desc(), eng);
    #endif
            auto weights_memory = memory(fwd_batch_norm_pd.weights_desc(), eng);
    
            stream s(eng);
    
            bn_fwd.reset(new batch_normalization_forward(fwd_batch_norm_pd));
    
            std::unordered_map< int, memory > args_fwd = { { MKLDNN_ARG_SRC, src_memory},
                                                                 { MKLDNN_ARG_DST, dst_memory},
    #ifdef FUSED_BN
                    { MKLDNN_ARG_WORKSPACE, workspace_memory},
    #endif
                                                                 { MKLDNN_ARG_MEAN, mean_memory},
                                                                 { MKLDNN_ARG_VARIANCE, variance_memory},
                                                                 { MKLDNN_ARG_WEIGHTS, weights_memory}};
    
            rand_fill(src_layer.data(), src_layer_md.get_size());
    
            //Warmup
            (*bn_fwd).execute(s, args_fwd);
    
            bench_result res_fwd = timeit(prob.iters,
                                       [&](){
                                           (*bn_fwd).execute(s, args_fwd);
                                       });
    
    #if 1
    
            std::vector<float> diff_dst_layer(tz_volume(src_layer_dims), 1.0f);
            auto diff_dst_md = memory::desc({src_layer_dims}, memory::data_type::f32, tag::nchw);
            auto diff_dst_memory = memory(diff_dst_md, eng, diff_dst_layer.data());
    
            std::vector<float> diff_src_layer(tz_volume(src_layer_dims), 1.0f);
            auto diff_src_md = memory::desc({src_layer_dims}, memory::data_type::f32, tag::nchw);
            auto diff_src_memory = memory(diff_src_md, eng, diff_src_layer.data());
    
            batch_normalization_backward::desc bwd_batch_norm_d = {prop_kind::backward,
                                                                   diff_src_md,
                                                                   src_layer_md,
                                                                   (float)1E-5,
                                                                   flags};
            auto bwd_batch_norm_pd = batch_normalization_backward::primitive_desc(bwd_batch_norm_d, eng, fwd_batch_norm_pd);
            bn_bwd.reset(new batch_normalization_backward(bwd_batch_norm_pd));
    
            auto diff_weights_memory = memory(bwd_batch_norm_pd.diff_weights_desc(), eng);
    
            std::unordered_map< int, memory >  args_bwd = { { MKLDNN_ARG_SRC, dst_memory},
                                                                   { MKLDNN_ARG_MEAN, mean_memory},
                                                                   { MKLDNN_ARG_DST, src_memory},
                                                                   { MKLDNN_ARG_VARIANCE, variance_memory},
                                                                   { MKLDNN_ARG_DIFF_SRC, diff_src_memory},
                                                                   { MKLDNN_ARG_DIFF_DST, diff_dst_memory},
                                                                   { MKLDNN_ARG_WEIGHTS, weights_memory},
                                                                   { MKLDNN_ARG_DIFF_WEIGHTS, diff_weights_memory}
    #ifdef FUSED_BN
                    ,{ MKLDNN_ARG_WORKSPACE, workspace_memory}
    #endif
            };
    
            //Warmup
            (*bn_bwd).execute(s, args_bwd);
    
            auto res_bwd = timeit(prob.iters,
                   [&](){
                       (*bn_bwd).execute(s, args_bwd);
                   });
    #else
            auto res_bwd = (bench_result){0.0,0.0};
    #endif
    </float></float></float></float></int></primitive>
    点赞 评论 复制链接分享
  • weixin_39773447 weixin_39773447 5月前

    Since it worked for you, i included my code in the examples and build it with the mkldnn settings. Now the forwardpass works. My build setup seems skewed. Nevertheless the backward pass does not work. The new Error: "Error: could not execute a primitive" Thanks for the quick response!

    点赞 评论 复制链接分享
  • weixin_39812039 weixin_39812039 5月前

    Thanks! Two things here: 1. I see you use scale&shift (aka weights) but set flags = 0, which means you don't want to use scale and shift. If you change it to flags = normalization_flags::use_scale_shift the example passes. 2. If you indeed don't want to use scale & shift, please use prop_kind::backward_data instead of prop_kind::backward.

    点赞 评论 复制链接分享
  • weixin_39773447 weixin_39773447 5月前

    Thanks for the clarification, it works now.

    Thanks a lot for that quick help!

    点赞 评论 复制链接分享

相关推荐