在VS中将算法改为CUDA运行时遇到的问题:
我的GPU为GTX 1650 Ti,准备改动的一个视频运动目标检测算法,在将结构体指针传输到设备端时遇到了问题,首先是一个模型结构体,其中有一个像素结构体指针,像素结构体中含有 char 型指针;代码如下:
typedef struct
{
unsigned char* samples;
unsigned int numberOfSamples;
unsigned int sizeOfSample;
}pixel;
typedef struct
{
pixel* pixels;
unsigned int width;
unsigned int height;
unsigned int stride;
//样本大小
unsigned int numberOfSamples;
//匹配阈值
unsigned int matchingThreshold;
//匹配的个数
unsigned int matchingNumber;
unsigned int updateFactor;
}vibeModel;
model = (vibeModel*)calloc(1, sizeof(vibeModel));
model->pixels = (pixel*)calloc(model->width * model->height, sizeof(pixel));
for (unsigned int i = 0; i < model->width * model->height; i++)
{
model->pixels[i].samples = (unsigned char*)calloc(30, sizeof(unsigned char));
}
开始是使用了统一内存cudaMallocManaged()直接在开辟时使用,这样不需要cudaMemcpy()了,这样改动后确实可以成功运行了,但是只能跑200*200像素的图像,超过了这个大小就提醒 addKernel launch failed: an illegal memory access was encountered ;用张1000*1000图片测试网格大小设置及核函数没有问题,那么请问是不是cudaMallocManaged()分配的哪一部分的空间,应该和cudaMalloc()不一样吧,我显存4G的应该够用。
第一个问题就是使用cudaMallocManaged()分配能正常处理200*200图像,超过了后像300*300,500*500 就报错illegal memory access was encountered ,请问有什么解决办法。
cudaMallocManaged(&model->pixels, model->width * model->height * sizeof(pixel));
cudaMallocManaged(&model->pixels, model->width * model->height * sizeof(pixel));
for(){
cudaMallocManaged(&model->pixels[i].samples, model->numberOfSamples * sizeof(unsigned char));
}
上面的第一个问题我猜测是cudaMallocManaged()能分配的空间不够,所以想正常cudaMalloc()然后cudaMemcpy()到设备端,但是这个结构体指针一直拷贝不成功,查网上资料结构体成员拷贝介绍多为1个结构体,然后成员指针是多个,对其参考下,我是准备将width*height 个像素结构体pixel, 每个pixel中有30个char,这些进行拷贝,代码如下:
model = (vibeModel*)calloc(1, sizeof(vibeModel));
model->pixels = (pixel*)calloc(model->width * model->height, sizeof(pixel));
for (unsigned int i = 0; i < model->width * model->height; i++)
{
model->pixels[i].samples = (unsigned char*)calloc(30, sizeof(unsigned char));
}
pixel* pix;
cudaMalloc(&pix, enhanced.size().width * enhanced.size().height * sizeof(pixel));
for (unsigned int i = 0; i < enhanced.size().width * enhanced.size().height; i++)
{
cudaMalloc((void**)&backgroundSubtract.model->pixels[i].samples, sizeof(unsigned char) * 30);
cudaMemcpy(&pix[i], &backgroundSubtract.model->pixels[i], sizeof(pixel),cudaMemcpyHostToDevice);
}
会报错 addKernel launch failed: an illegal memory access was encountered ,请问 多个结构体,各个结构体中附带指针的结构该如何拷贝到设备端呢,非常感谢!