dongxianrang9269 2015-02-13 09:11
浏览 62
已采纳

在PHP,Node和Golang中找到两个数组之间的差异

Here is a typical example of what I need to do

$testArr = array(2.05080E6,29400,420);

$stockArrays =  array(
                      array(2.05080E6,29400,0),
                      array(2.05080E6,9800,420),
                      array(1.715E6,24500,280),
                      array(2.05080E6,29400,140),
                      array(2.05080E6,4900,7));

I need to identify the stockArray that is the least different. A few clarifications

  • The numeric values of array elements at each position are guaranteed not to overlap. (i.e. arr[0] will always have the biggest values, arr1 will be at least an order of 10 magnitude smaller etc).
  • The absolute values of the differences do not count when determining least different. Only, the number of differing array indices matter.
  • Positional differences do have a weighting. Thus in my example stockArr1 is "more different" thought it too - like its stockArr[0] & stockArr[3] counterparts - differs in only one index position because that index position is bigger.
  • The number of stockArrays elements will typically be less than 10 but could potentially be much more (though never into 3 figures)
  • The stock arrays will always have the same number of elements. The test array will have the same or fewer elements. However, when fewer testArr would be padded out so that potentially matching elements are always in the same place as the stockArray. e.g.

    $testArray(29400,140)

would be transformed to

$testArray(0,29400,140);

prior to being subjected to difference testing.

  • Finally, a tie is possible. For instance my example above the matches would be stockArrays[0] and stockArrays[3].

In my example the result would be

$result = array(0=>array(0,0,1),3=>array(0,0,1));

indicating that the least different stock arrays are at indices 0 & 3 with the differences being at position 2.

In PHP I would handle all of this with array_diff as my starting point. For Node/JavaScript I would probably be tempted to the php.js array_diff port though I would be inclined to explore a bit given that in the worst cast scenario it is an O(n2) affair.

I am a newbie when it comes to Golang so I am not sure how I would implement this problem there. I have noted that Node does have an array_diff npm module.

One off-beat idea I have had is converting the array to a padded string (smaller array elements are 0 padded) and effectively do an XOR on the ordinal value of each character but have dismissed that as probably a rather nutty thing to do.

I am concerned with speed but not at all costs. In an ideal world the same solution (algorithm) would be used in each target language though in reality the differences between them might mean that is not possible/not a good idea.

Perhaps someone here might be able to point me to less pedestrian ways of accomplishing this - i.e. not just array_diff ports.

  • 写回答

1条回答 默认 最新

  • duancuan7057 2015-02-13 22:10
    关注

    Here's the equivalent of the array_diff solution: (assuming I didn't make a mistake)

    package main
    
    import "fmt"
    
    func FindLeastDifferent(needle []float64, haystack [][]float64) int {
        if len(haystack) == 0 {
            return -1
        }
        var currentIndex, currentDiff int
        for i, arr := range haystack {
            diff := 0
            for j := range needle {
                if arr[j] != needle[j] {
                    diff++
                }
            }
            if i == 0 || diff < currentDiff {
                currentDiff = diff
                currentIndex = i
            }
        }
    
        return currentIndex
    }
    
    func main() {
        idx := FindLeastDifferent(
            []float64{2.05080E6, 29400, 420},
            [][]float64{
                {2.05080E6, 29400, 0},
                {2.05080E6, 9800, 420},
                {1.715E6, 24500, 280},
                {2.05080E6, 29400, 140},
                {2.05080E6, 4900, 7},
                {2.05080E6, 29400, 420},
            },
        )
        fmt.Println(idx)
    }
    

    Like you said its O(n * m) where n is the number of elements in the needle array, and m is the number of arrays in the haystack.

    If you don't know the haystack ahead of time, then there's probably not much you can do to improve this. But if, instead, you're storing this list in a database, I think your intuition about string search has some potential. PostgreSQL for example supports string similarity indexes. (And here's an explanation of a similar idea for regular expressions: http://swtch.com/~rsc/regexp/regexp4.html)

    One other idea: if your arrays are really big you can calculate fuzzy hashes (http://ssdeep.sourceforge.net/) which would make your n smaller.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算
  • ¥15 请问如何在openpcdet上对KITTI数据集的测试集进行结果评估?
  • ¥15 powerbuilder中的datawindow数据整合到新的DataWindow
  • ¥20 有人知道这种图怎么画吗?
  • ¥15 pyqt6如何引用qrc文件加载里面的的资源
  • ¥15 安卓JNI项目使用lua上的问题
  • ¥20 RL+GNN解决人员排班问题时梯度消失
  • ¥60 要数控稳压电源测试数据
  • ¥15 能帮我写下这个编程吗