lrony* 2010-03-05 12:48 采纳率: 0%
浏览 421
已采纳

Float 和 double 之间的区别是什么?

I've read about the difference between double precision and single precision. However, in most cases, float and double seem to be interchangeable, i.e. using one or the other does not seem to affect the results. Is this really the case? When are floats and doubles interchangeable? What are the differences between them?

转载于:https://stackoverflow.com/questions/2386772/what-is-the-difference-between-float-and-double

  • 写回答

11条回答 默认 最新

  • 零零乙 2010-03-05 13:06
    关注

    Huge difference.

    As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.

    Here's how the number of digits are calculated:

    double has 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits

    float has 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits

    This precision loss could lead to truncation errors much easier to float up, e.g.

    float a = 1.f / 81;
    float b = 0;
    for (int i = 0; i < 729; ++ i)
        b += a;
    printf("%.7g\n", b); // prints 9.000023
    

    while

    double a = 1.0 / 81;
    double b = 0;
    for (int i = 0; i < 729; ++ i)
        b += a;
    printf("%.15g\n", b); // prints 8.99999999999996
    

    Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.

    During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.


    Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.


    Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.


    [1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(10条)

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog