代码功能:使用CUDA计算一个二维图片的像素,每个像素点分配一个线程。
那么按道理运行的时间跟图片大小没关系吧,因为我为每个像素都安排了一个线程
dim3 blocks((WIDTH + THREAD_NUM - 1) / THREAD_NUM, (LENGTH + THREAD_NUM - 1) / THREAD_NUM);
dim3 threads(THREAD_NUM, THREAD_NUM);
global_findTheClosestSite<< <blocks, threads >> >(dthis);
__device__ void CPolyVoronoi::device_findTheClosestSite()
{
int x = threadIdx.x + blockIdx.x*blockDim.x;
int y = threadIdx.y + blockIdx.y*blockDim.y;
int offset = x + y*blockDim.x*gridDim.x;
float minDis;
int closestSite;
if (offset<WIDTH*LENGTH)
{
minDis = 100000;
closestSite = 0;
CUDA::point p(pix[offset].centerPt);
int ns = numOfSites;
float currentDis;
for (size_t k = 0; k < ns; ++k)
{
currentDis=pTop_distance3d(dseeds[k], p);
if (currentDis < minDis)
{
minDis = currentDis;
closestSite = k;
}
}
colorOfPixel[offset] = closestSite;
}
}
实际情况是,处理200*200的数据需要10s,而50*50只用0.5s。这是为什么呢?