douchen4915 2016-03-03 07:24
浏览 39
已采纳

找出数组中的序列相似性

I have a task where I have three arrays A,B,C. All of the contain the same data. For the sake of simplicity lets assume the data is numbers 1 to 5. The data would be in different jumbled sequences. I want to find out among B & C which array has data most similar to A.

Eg: 
A = 1,2,3,4,5
B = 1,2,3,5,4
C = 4,1,2,3,5

In this case, it is easy to visually comprehend that B is more similar to A. But it gets more complicated for really jumbled sequences.

Eg: 
A = 1,2,3,4,5
B = 5,3,1,4,2
C = 4,1,2,3,5

In this case, I would assume C to be more closer to A. I am thinking that this assumption can be quantified as: How many elements have the same sequence in both arrays? In above example the subsequence of [1,2,3] is the same in both arrays. The second question would be what is the offset difference between the similar subsequence ? In this case it is 1, because the subsequence begins at index 0 for A and index 1 for C.

So the number of elements in a matching sequence and their offsets are what I am thinking to use. I plan on adding a weightage to these two entities (number of elements in matching sequence, and offset difference in their occurrence)

Does this make sense? I only need a rough approximation of similarity and the results do not need to be exact. Are there any formal mathematical or data-structure models that solve this problem?

BTW, the project where I need this implemented is in PHP. Does it have any inbuilt functions like the levenstein model for string difference?

Any suggestions are very welcome!

  • 写回答

1条回答 默认 最新

  • duanou2526 2016-03-03 08:08
    关注

    Well I suppose you can come up with your own algorithm (for instance generate all suffixes and then search for them and then define a scoring procedure) or you could use a well known algorithm like
    Smith-Waterman for local alignment or Needleman-Wunsch for global. The advantage of these algorithms is that they are well-understood and give you all the possible alignments (and you can choose the best for your case).

    NW in PHP

    SW in PHP

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab
  • ¥20 重新写的代码替换了之后运行hbuliderx就这样了
  • ¥100 监控抖音用户作品更新可以微信公众号提醒
  • ¥15 UE5 如何可以不渲染HDRIBackdrop背景
  • ¥70 2048小游戏毕设项目
  • ¥20 mysql架构,按照姓名分表
  • ¥15 MATLAB实现区间[a,b]上的Gauss-Legendre积分