douchen4915 2016-03-03 07:24
浏览 39
已采纳

找出数组中的序列相似性

I have a task where I have three arrays A,B,C. All of the contain the same data. For the sake of simplicity lets assume the data is numbers 1 to 5. The data would be in different jumbled sequences. I want to find out among B & C which array has data most similar to A.

Eg: 
A = 1,2,3,4,5
B = 1,2,3,5,4
C = 4,1,2,3,5

In this case, it is easy to visually comprehend that B is more similar to A. But it gets more complicated for really jumbled sequences.

Eg: 
A = 1,2,3,4,5
B = 5,3,1,4,2
C = 4,1,2,3,5

In this case, I would assume C to be more closer to A. I am thinking that this assumption can be quantified as: How many elements have the same sequence in both arrays? In above example the subsequence of [1,2,3] is the same in both arrays. The second question would be what is the offset difference between the similar subsequence ? In this case it is 1, because the subsequence begins at index 0 for A and index 1 for C.

So the number of elements in a matching sequence and their offsets are what I am thinking to use. I plan on adding a weightage to these two entities (number of elements in matching sequence, and offset difference in their occurrence)

Does this make sense? I only need a rough approximation of similarity and the results do not need to be exact. Are there any formal mathematical or data-structure models that solve this problem?

BTW, the project where I need this implemented is in PHP. Does it have any inbuilt functions like the levenstein model for string difference?

Any suggestions are very welcome!

  • 写回答

1条回答 默认 最新

  • duanou2526 2016-03-03 08:08
    关注

    Well I suppose you can come up with your own algorithm (for instance generate all suffixes and then search for them and then define a scoring procedure) or you could use a well known algorithm like
    Smith-Waterman for local alignment or Needleman-Wunsch for global. The advantage of these algorithms is that they are well-understood and give you all the possible alignments (and you can choose the best for your case).

    NW in PHP

    SW in PHP

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 BP神经网络控制倒立摆
  • ¥20 要这个数学建模编程的代码 并且能完整允许出来结果 完整的过程和数据的结果
  • ¥15 html5+css和javascript有人可以帮吗?图片要怎么插入代码里面啊
  • ¥30 Unity接入微信SDK 无法开启摄像头
  • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
  • ¥20 cad图纸,chx-3六轴码垛机器人
  • ¥15 移动摄像头专网需要解vlan
  • ¥20 access多表提取相同字段数据并合并
  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算