7*4 2012-09-04 13:51 采纳率: 0%
浏览 245
已采纳

为什么当循环遍历整个8192元素时,我的程序会变慢?

Here is the extract from the program in question. The matrix img[][] has the size SIZE×SIZE, and is initialized at:

img[j][i] = 2 * j + i

Then, you make a matrix res[][], and each field in here is made to be the average of the 9 fields around it in the img matrix. The border is left at 0 for simplicity.

for(i=1;i<SIZE-1;i++) 
    for(j=1;j<SIZE-1;j++) {
        res[j][i]=0;
        for(k=-1;k<2;k++) 
            for(l=-1;l<2;l++) 
                res[j][i] += img[j+l][i+k];
        res[j][i] /= 9;
}

That's all there's to the program. For completeness' sake, here is what comes before. No code comes after. As you can see, it's just initialization.

#define SIZE 8192
float img[SIZE][SIZE]; // input image
float res[SIZE][SIZE]; //result of mean filter
int i,j,k,l;
for(i=0;i<SIZE;i++) 
    for(j=0;j<SIZE;j++) 
        img[j][i] = (2*j+i)%8196;

Basically, this program is slow when SIZE is a multiple of 2048, e.g. the execution times:

SIZE = 8191: 3.44 secs
SIZE = 8192: 7.20 secs
SIZE = 8193: 3.18 secs

The compiler is GCC. From what I know, this is because of memory management, but I don't really know too much about that subject, which is why I'm asking here.

Also how to fix this would be nice, but if someone could explain these execution times I'd already be happy enough.

I already know of malloc/free, but the problem is not amount of memory used, it's merely execution time, so I don't know how that would help.

转载于:https://stackoverflow.com/questions/12264970/why-is-my-program-slow-when-looping-over-exactly-8192-elements

  • 写回答

3条回答 默认 最新

  • 旧行李 2012-09-04 14:43
    关注

    The difference is caused by the same super-alignment issue from the following related questions:

    But that's only because there's one other problem with the code.

    Starting from the original loop:

    for(i=1;i<SIZE-1;i++) 
        for(j=1;j<SIZE-1;j++) {
            res[j][i]=0;
            for(k=-1;k<2;k++) 
                for(l=-1;l<2;l++) 
                    res[j][i] += img[j+l][i+k];
            res[j][i] /= 9;
    }
    

    First notice that the two inner loops are trivial. They can be unrolled as follows:

    for(i=1;i<SIZE-1;i++) {
        for(j=1;j<SIZE-1;j++) {
            res[j][i]=0;
            res[j][i] += img[j-1][i-1];
            res[j][i] += img[j  ][i-1];
            res[j][i] += img[j+1][i-1];
            res[j][i] += img[j-1][i  ];
            res[j][i] += img[j  ][i  ];
            res[j][i] += img[j+1][i  ];
            res[j][i] += img[j-1][i+1];
            res[j][i] += img[j  ][i+1];
            res[j][i] += img[j+1][i+1];
            res[j][i] /= 9;
        }
    }
    

    So that leaves the two outer-loops that we're interested in.

    Now we can see the problem is the same in this question: Why does the order of the loops affect performance when iterating over a 2D array?

    You are iterating the matrix column-wise instead of row-wise.


    To solve this problem, you should interchange the two loops.

    for(j=1;j<SIZE-1;j++) {
        for(i=1;i<SIZE-1;i++) {
            res[j][i]=0;
            res[j][i] += img[j-1][i-1];
            res[j][i] += img[j  ][i-1];
            res[j][i] += img[j+1][i-1];
            res[j][i] += img[j-1][i  ];
            res[j][i] += img[j  ][i  ];
            res[j][i] += img[j+1][i  ];
            res[j][i] += img[j-1][i+1];
            res[j][i] += img[j  ][i+1];
            res[j][i] += img[j+1][i+1];
            res[j][i] /= 9;
        }
    }
    

    This eliminates all the non-sequential access completely so you no longer get random slow-downs on large powers-of-two.


    Core i7 920 @ 3.5 GHz

    Original code:

    8191: 1.499 seconds
    8192: 2.122 seconds
    8193: 1.582 seconds
    

    Interchanged Outer-Loops:

    8191: 0.376 seconds
    8192: 0.357 seconds
    8193: 0.351 seconds
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥100 关于python拓展屏使用pyautogui.screenshot无法截图的问题
  • ¥15 有偿求答 无导出权限怎么快速导出企业微信微文档
  • ¥15 求解答,怎么数码管中这么加入闹钟或者传感器,这应该怎么加入相应的代码
  • ¥15 scottplot5
  • ¥30 想问问这个建模怎么编程没有思路
  • ¥15 关于imageENview(ImageEN)中新建图层并根据鼠标位置添加图标
  • ¥100 用两台电脑局域联网进行MT5的EA参数优化,但是连接不上,日志提示:
  • ¥15 FastAPI报错: AsyncSession不是有效Pydantic类型
  • ¥50 这Mac系统提示虚拟内存不足,怎么解决
  • ¥15 Rs232电路无法收发数据,求帮助