dongtuo6562 2018-05-21 12:00
浏览 35
已采纳

如何比较拼接的多维数组(具有大量数据)?

I have a huge array $properties with about 500.000 items:

  array(470000) {
    ["12345"]=>
    array(5) {
      ["dateTime"]=>
      string(19) "2016-10-12 19:46:25"
      ["fileName"]=>
      string(46) "monkey.jpg"
      ["path"]=>
      string(149) "Volumes/animals/monkey.jpg"
      ["size"]=>
      string(7) "2650752"
    }
    ["678790"]=>
    array(5) {
      ["dateTime"]=>
      string(19) "2016-10-12 14:39:43"
      ["fileName"]=>
      string(45) "elephant.jpg"
      ["path"]=>
      string(171) "Volumes/animals/elephant.jpg"
      ["size"]=>
      string(7) "2306688"
    }

... and so on.

So to make performance better, I spliced it into parts:

$splice_size = 10000;
        $count_arr = (count($properties)/$splice_size)-1;


        For($i=0; $i<$count_arr; $i++){
            $res[] = array_splice($properties, 0,$splice_size); 
        }
        $res[] = array_splice($properties, 0,count($properties)); 

Now my array looks like this:

array(4) {
  [0]=>
  array(10000) {
    ["12345"]=>
    array(5) {
      ["dateTime"]=>
      string(19) "2016-10-12 19:46:25"
      ["fileName"]=>
      string(46) "monkey.jpg"
      ["path"]=>
      string(149) "Volumes/animals/monkey.jpg"
      ["size"]=>
      string(7) "2650752"
    }
    ["678790"]=>
    array(5) {
      ["dateTime"]=>
      string(19) "2016-10-12 14:39:43"
      ["fileName"]=>
      string(45) "elephant.jpg"
      ["path"]=>
      string(171) "Volumes/animals/elephant.jpg"
      ["size"]=>
      string(7) "2306688"
    }

   ... and so on.
   }
  [1]=>....
  and so on....
}

I want now compare two of these arrays:

  function array_diff_assoc_recursive($array1, $array2)
                {
                    foreach($array1 as $key => $value)
                    {
                        if(is_array($value))
                        {
                            if(!isset($array2[$key]))
                            {
                                $difference[$key] = $value;
                            }
                            elseif(!is_array($array2[$key]))
                            {
                                $difference[$key] = $value;
                            }
                            else
                            {
                                $new_diff = array_diff_assoc_recursive($value, $array2[$key]);
                                if($new_diff != FALSE)
                                {
                                    $difference[$key] = $new_diff;
                                }
                            }
                        }
                        elseif(!isset($array2[$key]) || $array2[$key] != $value)
                        {
                            $difference[$key] = $value;
                        }
                    }
                    return !isset($difference) ? 0 : $difference;
                }


                echo "<pre>";
                print_r(array_diff_assoc_recursive($new, $res));
                echo "</pre>";

But the system crashes. Too much data. So my question is, their must be an advantage of splicing the array (like making chunks) that I still do not get, to get better performance.

  • 写回答

1条回答 默认 最新

  • duanjiao3686 2018-05-21 12:33
    关注

    If I were you I would just do:

    $different = [];
    $missingFrom2 = [];
    
    foreach ($array1 as $key => $value) {
        if (!isset($array2[$key])) { $missingFrom2[] = $key; }
        if ($array2[$key] != $value) { $different[] = $key; }
    }
    $missingFrom1 = array_diff(array_keys($array2), array_keys($array1));
    

    $different will be all keys which are different.

    What you're doing seems a bit over-engineered for not particular benefit

    Examples: http://sandbox.onlinephpfunctions.com/code/7ff02f562e0591e8afb45ea51799b847fbc4063b http://sandbox.onlinephpfunctions.com/code/402926605ba8a195d2dfc667be146654117cd078

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值