最近在学习opencl,写了一段smith-waterman算法计算得分矩阵的程序,运行在FPGA上反而比CPU上性能差。因为是初学,不知道写的哪里有问题。
附上代码:
__kernel void __attribute__ ((reqd_work_group_size(512,512,1)))
krnl_sw(
__global int* ref,
__global int* alt,
__global int* sw,
__global int* btrack,
const int overhangStrategy,
const int match,
const int mismatch,
const int open,
const int extend,
const int ncol,
const int nrow
) {
int col = get_global_id(0);
int row = get_global_id(1);
for (k = 2;k<ncol+nrow-1;k++)
{
if(col + row == k)
{
up_score = sw[(row - 1)*ncol + col] + extend*(row-1) + open ;
left_score = sw[row*ncol + col - 1] + extend*(row-1) + open ;
up_left_score = sw[(row - 1)*ncol + col - 1] + diag_score(ref[col - 1], alt[row - 1], match, mismatch);
sw[row*ncol + col] = max(up_score, left_score);
sw[row*ncol + col] = max(up_left_score, sw[row*ncol + col]);
}
}
return;
}
大致的计算过程是初始化矩阵第一行和第一列,然后延对角线放向逐次计算斜对角上矩阵的得分。
计算512*512的矩阵,运行10次,CPU上耗时100ms不到,FPGA上反而是几十秒,慢了近1000倍....