duangou1551 2017-02-24 17:54
浏览 132
已采纳

Golang中多个int片段的序列比对

I am trying to figure out how I can align slightly imperfect binary slices using golang. The following four slices all align correctly with different offsets. However, not every bit is the same (marked below) so I can't just compare raw chunks.

func main() {

    // Match all three slices up (ignoring occasional errors)
    s1 := []int16{0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1}
    s2 := []int16{ /*                     */ 0, 1, 1, 0, 0, 0, 1, 1, 1, 1}
    //                                       ^              ^
    s3 := []int16{0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1}
    //               ^
    s4 := []int16{ /*            */ 0, 0, 0, 1, 1, 1, 0, 0}

    slices := make([][]int16, 3)
    slices = append(slices, s1, s2, s3, s4)


    offsets := forgivingSyncHere(slices)
}

Here is a https://play.golang.org/p/zqJ_4qLc8O

  • 写回答

2条回答 默认 最新

  • douyu0792 2017-02-24 20:55
    关注

    It depends on what your "cost" function is, where your goal is to minimize your "cost".

    A cost function could be something like this. The idea is that a "mismatch" is more costly than if there isn't anything to match, which we'll call "overruns" (say twice as costly). Take the number of cases where a[i] != b[i + offset] for a and b equal to s1,s2,s3,s4 and double it. Then add to that the absolute value of each offset for each pairing (in this case 6 pairings for 4 arrays) for the number of overruns at the beginning. Then add onto that the overruns at the end.

    Sample cost function:

    func cost(sn [][]int16, offsets [][]int) int {
      // cost accumulator
      c := 0.0
    
      // the factor of how much more costly a mismatch is than an overrun
      mismatchFactor := 2.0
    
      // define what you want, but here is an example of what I said above
      for i1:=0;i1<len(sn);i++ {
        for i2:=i1+1;i2<len(sn);i2++ {
          c += mismatchFactor * diff(sn[i1], sn[i2], offsets[i1][i2])
          c += math.Abs(offsets[i1][i2])
          c += math.Abs(len(sn[i1]) + offsets[i1][i2] - len(sn[i2]))
        }
      }
    }
    
    // offset of the offset of s1 wrt s2
    func diff(s1 []int16, s2 []int16, offset int) int {
      // index, index, diff total
      i1,i2,d := 0,0,0
      if offset >= 0 {
        i1 += offset
      } else {
        i2 -= offset
      }
      while i1<len(s1) && i2<len(s2) {
        if s1[i1] != s2[i2] {
          d++
        }
        i1++
        i2++
      }
      return d
    }
    

    Make your cost function however you want, this is just an example. However, assuming you have a cost function, a brute force algorithm is pretty easy to come up with. You can try to optimize the algorithm, though :). There are many ideas. This is very similar to string search algorithms, with edit distances. Wikipedia and Google have many results.

    Disclaimer: all of this is untested :), but it should get you started

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?