在《CUDA C编程权威指南》第155页给出了CPU端的矩阵转置代码,如下所示
void transposeHost(float* out, float* in, const int nx, const int ny) {
for (int iy = 0; iy < ny; ++iy) {
for (int ix = 0; ix < nx; ++ix) {
out[ix * ny + iy] = in[iy * nx + ix];
}
}
}
但是我发现该代码只能对方阵的转置有效,我的测试如下:
int main() {
const int nx = 3;
const int ny = 4;
float in[12] = { 0,1,2,3,4,5,6,7,8,9,10,11 };
float out[12];
transposeHost(out, in, nx, ny);
std::cout << "转置后的输出:";
for (int i = 0; i < 12; ++i) {
std::cout << out[i] << " ";
}
std::cout << std::endl;
return 0;
}
以上这个3行4列的矩阵就不能成功转置,然而换成以下这个2x2的方阵就可以:
int main() {
const int nx = 2;
const int ny = 2;
float in[4] = { 0,1,2,3};
float out[4];
transposeHost(out, in, nx, ny);
std::cout << "转置后的输出:";
for (int i = 0; i < 4; ++i) {
std::cout << out[i] << " ";
}
std::cout << std::endl;
return 0;
}
请问这是为什么?