ℙℕℤℝ 2012-09-04 13:51
浏览 192
已采纳

为什么当循环遍历整个8192元素时,我的程序会变慢?

Here is the extract from the program in question. The matrix img[][] has the size SIZE×SIZE, and is initialized at:

img[j][i] = 2 * j + i

Then, you make a matrix res[][], and each field in here is made to be the average of the 9 fields around it in the img matrix. The border is left at 0 for simplicity.

for(i=1;i<SIZE-1;i++) 
    for(j=1;j<SIZE-1;j++) {
        res[j][i]=0;
        for(k=-1;k<2;k++) 
            for(l=-1;l<2;l++) 
                res[j][i] += img[j+l][i+k];
        res[j][i] /= 9;
}

That's all there's to the program. For completeness' sake, here is what comes before. No code comes after. As you can see, it's just initialization.

#define SIZE 8192
float img[SIZE][SIZE]; // input image
float res[SIZE][SIZE]; //result of mean filter
int i,j,k,l;
for(i=0;i<SIZE;i++) 
    for(j=0;j<SIZE;j++) 
        img[j][i] = (2*j+i)%8196;

Basically, this program is slow when SIZE is a multiple of 2048, e.g. the execution times:

SIZE = 8191: 3.44 secs
SIZE = 8192: 7.20 secs
SIZE = 8193: 3.18 secs

The compiler is GCC. From what I know, this is because of memory management, but I don't really know too much about that subject, which is why I'm asking here.

Also how to fix this would be nice, but if someone could explain these execution times I'd already be happy enough.

I already know of malloc/free, but the problem is not amount of memory used, it's merely execution time, so I don't know how that would help.

转载于:https://stackoverflow.com/questions/12264970/why-is-my-program-slow-when-looping-over-exactly-8192-elements

  • 写回答

3条回答 默认 最新

  • YaoRaoLov 2012-09-04 14:43
    关注

    The difference is caused by the same super-alignment issue from the following related questions:

    But that's only because there's one other problem with the code.

    Starting from the original loop:

    for(i=1;i<SIZE-1;i++) 
        for(j=1;j<SIZE-1;j++) {
            res[j][i]=0;
            for(k=-1;k<2;k++) 
                for(l=-1;l<2;l++) 
                    res[j][i] += img[j+l][i+k];
            res[j][i] /= 9;
    }
    

    First notice that the two inner loops are trivial. They can be unrolled as follows:

    for(i=1;i<SIZE-1;i++) {
        for(j=1;j<SIZE-1;j++) {
            res[j][i]=0;
            res[j][i] += img[j-1][i-1];
            res[j][i] += img[j  ][i-1];
            res[j][i] += img[j+1][i-1];
            res[j][i] += img[j-1][i  ];
            res[j][i] += img[j  ][i  ];
            res[j][i] += img[j+1][i  ];
            res[j][i] += img[j-1][i+1];
            res[j][i] += img[j  ][i+1];
            res[j][i] += img[j+1][i+1];
            res[j][i] /= 9;
        }
    }
    

    So that leaves the two outer-loops that we're interested in.

    Now we can see the problem is the same in this question: Why does the order of the loops affect performance when iterating over a 2D array?

    You are iterating the matrix column-wise instead of row-wise.


    To solve this problem, you should interchange the two loops.

    for(j=1;j<SIZE-1;j++) {
        for(i=1;i<SIZE-1;i++) {
            res[j][i]=0;
            res[j][i] += img[j-1][i-1];
            res[j][i] += img[j  ][i-1];
            res[j][i] += img[j+1][i-1];
            res[j][i] += img[j-1][i  ];
            res[j][i] += img[j  ][i  ];
            res[j][i] += img[j+1][i  ];
            res[j][i] += img[j-1][i+1];
            res[j][i] += img[j  ][i+1];
            res[j][i] += img[j+1][i+1];
            res[j][i] /= 9;
        }
    }
    

    This eliminates all the non-sequential access completely so you no longer get random slow-downs on large powers-of-two.


    Core i7 920 @ 3.5 GHz

    Original code:

    8191: 1.499 seconds
    8192: 2.122 seconds
    8193: 1.582 seconds
    

    Interchanged Outer-Loops:

    8191: 0.376 seconds
    8192: 0.357 seconds
    8193: 0.351 seconds
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥30 关于#微信#的问题:微信实名不绑卡 可以实现吗 有没有专家 可以解决
  • ¥15 (标签-考研|关键词-set)
  • ¥15 求修改代码,图书管理系统
  • ¥15 请问有没求偏多标签数据集yeast,reference,recreation,scene,health数据集。
  • ¥15 传感网应用开发单片机实训
  • ¥15 Delphi 关于sAlphaImageList使用问题
  • ¥15 寻找将CAJ格式文档转txt文本的方案
  • ¥15 shein测试开发会问些啥我是写java的
  • ¥15 关于#单片机#的问题:我有个课程项目设计,我想在STM32F103veTX单片机,M3主控模块上设计一个程序,在Keil uVision5(C语言)上代码该怎么编译?(嫌钱少我可以加钱,急急急)
  • ¥15 opnet仿真网络协议遇到问题