为什么将0.1 f 改为0会降低10倍的性能？

Why does this bit of code,

const float x[16] = {  1.1,   1.2,   1.3,     1.4,   1.5,   1.6,   1.7,   1.8,
                       1.9,   2.0,   2.1,     2.2,   2.3,   2.4,   2.5,   2.6};
const float z[16] = {1.123, 1.234, 1.345, 156.467, 1.578, 1.689, 1.790, 1.812,
                     1.923, 2.034, 2.145,   2.256, 2.367, 2.478, 2.589, 2.690};
float y[16];
for (int i = 0; i < 16; i++)
{
    y[i] = x[i];
}

for (int j = 0; j < 9000000; j++)
{
    for (int i = 0; i < 16; i++)
    {
        y[i] *= x[i];
        y[i] /= z[i];
        y[i] = y[i] + 0.1f; // <--
        y[i] = y[i] - 0.1f; // <--
    }
}

run more than 10 times faster than the following bit (identical except where noted)?

const float x[16] = {  1.1,   1.2,   1.3,     1.4,   1.5,   1.6,   1.7,   1.8,
                       1.9,   2.0,   2.1,     2.2,   2.3,   2.4,   2.5,   2.6};
const float z[16] = {1.123, 1.234, 1.345, 156.467, 1.578, 1.689, 1.790, 1.812,
                     1.923, 2.034, 2.145,   2.256, 2.367, 2.478, 2.589, 2.690};
float y[16];
for (int i = 0; i < 16; i++)
{
    y[i] = x[i];
}

for (int j = 0; j < 9000000; j++)
{
    for (int i = 0; i < 16; i++)
    {
        y[i] *= x[i];
        y[i] /= z[i];
        y[i] = y[i] + 0; // <--
        y[i] = y[i] - 0; // <--
    }
}

when compiling with Visual Studio 2010 SP1. (I haven't tested with other compilers.)

转载于:https://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

5条回答默认最新

狐狸.fox 2012-02-16 16:20

关注

Welcome to the world of denormalized floating-point! They can wreak havoc on performance!!!

Denormal (or subnormal) numbers are kind of a hack to get some extra values very close to zero out of the floating point representation. Operations on denormalized floating-point can be tens to hundreds of times slower than on normalized floating-point. This is because many processors can't handle them directly and must trap and resolve them using microcode.

If you print out the numbers after 10,000 iterations, you will see that they have converged to different values depending on whether 0 or 0.1 is used.

Here's the test code compiled on x64:

int main() {

    double start = omp_get_wtime();

    const float x[16]={1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.3,2.4,2.5,2.6};
    const float z[16]={1.123,1.234,1.345,156.467,1.578,1.689,1.790,1.812,1.923,2.034,2.145,2.256,2.367,2.478,2.589,2.690};
    float y[16];
    for(int i=0;i<16;i++)
    {
        y[i]=x[i];
    }
    for(int j=0;j<9000000;j++)
    {
        for(int i=0;i<16;i++)
        {
            y[i]*=x[i];
            y[i]/=z[i];
#ifdef FLOATING
            y[i]=y[i]+0.1f;
            y[i]=y[i]-0.1f;
#else
            y[i]=y[i]+0;
            y[i]=y[i]-0;
#endif

            if (j > 10000)
                cout << y[i] << "  ";
        }
        if (j > 10000)
            cout << endl;
    }

    double end = omp_get_wtime();
    cout << end - start << endl;

    system("pause");
    return 0;
}

Output:

#define FLOATING
1.78814e-007  1.3411e-007  1.04308e-007  0  7.45058e-008  6.70552e-008  6.70552e-008  5.58794e-007  3.05474e-007  2.16067e-007  1.71363e-007  1.49012e-007  1.2666e-007  1.11759e-007  1.04308e-007  1.04308e-007
1.78814e-007  1.3411e-007  1.04308e-007  0  7.45058e-008  6.70552e-008  6.70552e-008  5.58794e-007  3.05474e-007  2.16067e-007  1.71363e-007  1.49012e-007  1.2666e-007  1.11759e-007  1.04308e-007  1.04308e-007

//#define FLOATING
6.30584e-044  3.92364e-044  3.08286e-044  0  1.82169e-044  1.54143e-044  2.10195e-044  2.46842e-029  7.56701e-044  4.06377e-044  3.92364e-044  3.22299e-044  3.08286e-044  2.66247e-044  2.66247e-044  2.24208e-044
6.30584e-044  3.92364e-044  3.08286e-044  0  1.82169e-044  1.54143e-044  2.10195e-044  2.45208e-029  7.56701e-044  4.06377e-044  3.92364e-044  3.22299e-044  3.08286e-044  2.66247e-044  2.66247e-044  2.24208e-044

Note how in the second run the numbers are very close to zero.

Denormalized numbers are generally rare and thus most processors don't try to handle them efficiently.

To demonstrate that this has everything to do with denormalized numbers, if we flush denormals to zero by adding this to the start of the code:

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

Then the version with 0 is no longer 10x slower and actually becomes faster. (This requires that the code be compiled with SSE enabled.)

This means that rather than using these weird lower precision almost-zero values, we just round to zero instead.

Timings: Core i7 920 @ 3.5 GHz:

//  Don't flush denormals to zero.
0.1f: 0.564067
0   : 26.7669

//  Flush denormals to zero.
0.1f: 0.587117
0   : 0.341406

In the end, this really has nothing to do with whether it's an integer or floating-point. The 0 or 0.1f is converted/stored into a register outside of both loops. So that has no effect on performance.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(4条)

报告相同问题？

关注问题

为什么将0.1 f 改为0会降低10倍的性能？ c++ visual studio
2012-02-16 15:58

回答 5 已采纳 Welcome to the world of denormalized floating-point! They can wreak havoc on performance!!! Denor
python中1整除0.1为什么是9.0？ python 有问必答
2022-03-03 10:53

回答 3 已采纳 0.1这个数本质上是取的一个近似数，所以你就可以明白了1//0.1=9.0。因为//是地板除，就是商取整的意思，而1//-0.1=-10是因为负数的话总是会四舍五入向负无穷大处指引。
matlab中修改向量长度以后，绘图为什么会报错？ matlab
2022-04-12 21:54

回答 1 已采纳程序没问题，但是你没有清除上一次的变量，所以当第一次T被赋值20000，fz的长度也为20000，当你第二次运行时，T改为200，for循环里只是更新前200个数，fz依旧是20000，因此造成fz与
深入理解仓颉编程语言：从基础语法到并发编程的全面指南评【基础数据类型】
2024-07-21 19:32

一键难忘的博客字典类型（Dictionary）是键值对的集合，用于存储具有唯一键的元素。字典类型使用表示，...main() {let a = 10println(result) // 输出 15在文件中定义模块内容。使用raise关键字抛出异常。可以定义自己的异常类型。
为什么数值输出只相差0.0000001,输入就差了0.1 c语言
2022-02-07 14:07

回答 1 已采纳因为输出只留一位小数，要取近似数。printf好像是四舍六入五成双，按理说第二个应该输出4.6，但是，浮点数是有精度丢失的，4.55在内存中存储会略少于4.55，所以输出是4.5
iterator反向遍历为什么会出错？ c++ 开发语言有问必答蓝桥杯
2022-03-15 20:48

回答 2 已采纳 for (it = a.end(); it!=a.begin();) *it永远取不到‘0’，直接break了 #include <iostream> #include <stri
为什么同样是double类型，输出的小数位数会不同？ java
2022-03-23 10:08

回答 4 已采纳 java的double和float类型在操作中会丧失精度，和预期结果产生偏差
仓颉编程语言开发指南 -- 基本概念
2024-07-09 09:08

chinusyan的博客仓颉编程语言开发指南 -- 基本概念
鸿蒙应用开发之仓颉编程语言基础
2024-08-07 00:52

天涯幺妹的博客仓颉编程语言是华为公司独立研发并开源的一种面向全场景应用开发的通用编程语言，可以兼顾开发效率和运行性能，并提供良好的编程体验。
Flink中的元编程与元学习
2023-07-25 00:31

AI天才研究院的博客 Flink 是 Apache 基金会开源的一款基于 Java 的分布式计算框架，它最初由 IBM 开发并于 2014 年宣布开源，目前已经成为 Apache Top-Level 项目，具有高吞吐量、低延迟等优点，被多家公司采用。在实际应用中，许多...
神经网络进化与混合编程——进化计算与模糊适应
2023-08-08 01:02

AI天才研究院的博客近年来，人工智能研究者们逐渐发现，将神经网络的知识迁移到传统优化问题上，可以提升机器学习的性能。在这个过程中，出现了一项新理论——进化计算，试图利用基因组信息、遗传密码、突变数据等多种方式，对神经网络...
【技术应用】模型微调：如何利用深度学习框架进行模型微调？
2023-07-14 02:28

AI天才研究院的博客模型微调（fine-tuning）是一种迁移学习方法，在不修改网络结构、直接对其最后几层的参数进行微调的同时，保留原网络前面的层参数不变，达到提升模型性能的目的。因此，模型微调非常适用于现有任务的相关领域、数据...
深度学习中的编程语言Tensorflow
2020-05-04 13:53

人邮异步社区的博客本章讲述的主要内容包括：预备知识；...Tensorflow是谷歌开发的一种开源编程语言，旨在让深度学习程序编程变得更简单。我们首先从一个程序开始。 import tensorflow as tf x = tf.constant("Hello Wo...
CPython解释器性能分析与优化
2023-02-25 16:46

仓颉编程语言的博客 CPython 是由 C 语言编写的 Python 纯解释器，采样分析（sampling profiling）可以更为精确地对其性能进行研究。本报告从不同视角探讨其中的开销构成，并讨论可行的优化方案。
MiniCPM:揭示端侧大语言模型的无限潜力
2024-06-07 07:28

AI浩的博客随着开发具有高达数万亿参数的大型语言模型（LLMs）的兴趣激增，关于资源效率和实际成本的担忧也随之而来，特别是考虑到实验的巨大成本。这一情形突显了探索小型语言模型（SLMs）作为资源高效替代方案的潜力。在此...
OpenFOAM软件二次开发：OpenFOAM软件性能优化与并行计算
2024-08-09 05:36

kkchenjj的博客 OpenFOAM软件二次开发：OpenFOAM软件性能优化与并行计算 OpenFOAM基础概述 OpenFOAM软件简介 OpenFOAM（Open Field Operation and Manipulation）是一款开源的CFD（计算流体动力学）软件，由英国的OpenCFD公司开发并...
没有解决我的问题, 去提问

悬赏问题

¥15 做个有关计算的小程序
¥15 MPI读取tif文件无法正常给各进程分配路径
¥15 如何用MATLAB实现以下三个公式（有相互嵌套）
¥30 关于#算法#的问题：运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题求各位帮我解答一下
¥15 setInterval 页面闪烁，怎么解决
¥15 如何让企业微信机器人实现消息汇总整合
¥50 关于#ui#的问题：做yolov8的ui界面出现的问题
¥15 如何用Python爬取各高校教师公开的教育和工作经历
¥15 TLE9879QXA40 电机驱动
¥20 对于工程问题的非线性数学模型进行线性化

码龄粉丝数原力等级 --

为什么将0.1 f 改为0会降低10倍的性能？

5条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

为什么将0.1 f 改为0会降低10倍的性能？

5条回答 默认 最新

悬赏问题

5条回答默认最新