7*4 2012-02-16 15:58 采纳率: 50%
浏览 513
已采纳

为什么将0.1 f 改为0会降低10倍的性能?

Why does this bit of code,

const float x[16] = {  1.1,   1.2,   1.3,     1.4,   1.5,   1.6,   1.7,   1.8,
                       1.9,   2.0,   2.1,     2.2,   2.3,   2.4,   2.5,   2.6};
const float z[16] = {1.123, 1.234, 1.345, 156.467, 1.578, 1.689, 1.790, 1.812,
                     1.923, 2.034, 2.145,   2.256, 2.367, 2.478, 2.589, 2.690};
float y[16];
for (int i = 0; i < 16; i++)
{
    y[i] = x[i];
}

for (int j = 0; j < 9000000; j++)
{
    for (int i = 0; i < 16; i++)
    {
        y[i] *= x[i];
        y[i] /= z[i];
        y[i] = y[i] + 0.1f; // <--
        y[i] = y[i] - 0.1f; // <--
    }
}

run more than 10 times faster than the following bit (identical except where noted)?

const float x[16] = {  1.1,   1.2,   1.3,     1.4,   1.5,   1.6,   1.7,   1.8,
                       1.9,   2.0,   2.1,     2.2,   2.3,   2.4,   2.5,   2.6};
const float z[16] = {1.123, 1.234, 1.345, 156.467, 1.578, 1.689, 1.790, 1.812,
                     1.923, 2.034, 2.145,   2.256, 2.367, 2.478, 2.589, 2.690};
float y[16];
for (int i = 0; i < 16; i++)
{
    y[i] = x[i];
}

for (int j = 0; j < 9000000; j++)
{
    for (int i = 0; i < 16; i++)
    {
        y[i] *= x[i];
        y[i] /= z[i];
        y[i] = y[i] + 0; // <--
        y[i] = y[i] - 0; // <--
    }
}

when compiling with Visual Studio 2010 SP1. (I haven't tested with other compilers.)

转载于:https://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x

  • 写回答

5条回答 默认 最新

  • 狐狸.fox 2012-02-16 16:20
    关注

    Welcome to the world of denormalized floating-point! They can wreak havoc on performance!!!

    Denormal (or subnormal) numbers are kind of a hack to get some extra values very close to zero out of the floating point representation. Operations on denormalized floating-point can be tens to hundreds of times slower than on normalized floating-point. This is because many processors can't handle them directly and must trap and resolve them using microcode.

    If you print out the numbers after 10,000 iterations, you will see that they have converged to different values depending on whether 0 or 0.1 is used.

    Here's the test code compiled on x64:

    int main() {
    
        double start = omp_get_wtime();
    
        const float x[16]={1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6};
        const float z[16]={1.123,1.234,1.345,156.467,1.578,1.689,1.790,1.812,1.923,2.034,2.145,2.256,2.367,2.478,2.589,2.690};
        float y[16];
        for(int i=0;i<16;i++)
        {
            y[i]=x[i];
        }
        for(int j=0;j<9000000;j++)
        {
            for(int i=0;i<16;i++)
            {
                y[i]*=x[i];
                y[i]/=z[i];
    #ifdef FLOATING
                y[i]=y[i]+0.1f;
                y[i]=y[i]-0.1f;
    #else
                y[i]=y[i]+0;
                y[i]=y[i]-0;
    #endif
    
                if (j > 10000)
                    cout << y[i] << "  ";
            }
            if (j > 10000)
                cout << endl;
        }
    
        double end = omp_get_wtime();
        cout << end - start << endl;
    
        system("pause");
        return 0;
    }
    

    Output:

    #define FLOATING
    1.78814e-007  1.3411e-007  1.04308e-007  0  7.45058e-008  6.70552e-008  6.70552e-008  5.58794e-007  3.05474e-007  2.16067e-007  1.71363e-007  1.49012e-007  1.2666e-007  1.11759e-007  1.04308e-007  1.04308e-007
    1.78814e-007  1.3411e-007  1.04308e-007  0  7.45058e-008  6.70552e-008  6.70552e-008  5.58794e-007  3.05474e-007  2.16067e-007  1.71363e-007  1.49012e-007  1.2666e-007  1.11759e-007  1.04308e-007  1.04308e-007
    
    //#define FLOATING
    6.30584e-044  3.92364e-044  3.08286e-044  0  1.82169e-044  1.54143e-044  2.10195e-044  2.46842e-029  7.56701e-044  4.06377e-044  3.92364e-044  3.22299e-044  3.08286e-044  2.66247e-044  2.66247e-044  2.24208e-044
    6.30584e-044  3.92364e-044  3.08286e-044  0  1.82169e-044  1.54143e-044  2.10195e-044  2.45208e-029  7.56701e-044  4.06377e-044  3.92364e-044  3.22299e-044  3.08286e-044  2.66247e-044  2.66247e-044  2.24208e-044
    

    Note how in the second run the numbers are very close to zero.

    Denormalized numbers are generally rare and thus most processors don't try to handle them efficiently.


    To demonstrate that this has everything to do with denormalized numbers, if we flush denormals to zero by adding this to the start of the code:

    _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
    

    Then the version with 0 is no longer 10x slower and actually becomes faster. (This requires that the code be compiled with SSE enabled.)

    This means that rather than using these weird lower precision almost-zero values, we just round to zero instead.

    Timings: Core i7 920 @ 3.5 GHz:

    //  Don't flush denormals to zero.
    0.1f: 0.564067
    0   : 26.7669
    
    //  Flush denormals to zero.
    0.1f: 0.587117
    0   : 0.341406
    

    In the end, this really has nothing to do with whether it's an integer or floating-point. The 0 or 0.1f is converted/stored into a register outside of both loops. So that has no effect on performance.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥50 求解vmware的网络模式问题 别拿AI回答
  • ¥24 EFS加密后,在同一台电脑解密出错,证书界面找不到对应指纹的证书,未备份证书,求在原电脑解密的方法,可行即采纳
  • ¥15 springboot 3.0 实现Security 6.x版本集成
  • ¥15 PHP-8.1 镜像无法用dockerfile里的CMD命令启动 只能进入容器启动,如何解决?(操作系统-ubuntu)
  • ¥30 请帮我解决一下下面六个代码
  • ¥15 关于资源监视工具的e-care有知道的嘛
  • ¥35 MIMO天线稀疏阵列排布问题
  • ¥60 用visual studio编写程序,利用间接平差求解水准网
  • ¥15 Llama如何调用shell或者Python
  • ¥20 谁能帮我挨个解读这个php语言编的代码什么意思?