ds000001 2015-11-02 19:04

已采纳

PHP中的对象比较和数组排序

I have a problem with object comparison in PHP. What seems like a straightforward code actually runs way too slow for my liking and as I am not that advanced in the language I would like some feedback and suggestions regarding the following code:

class TestTokenGroup {
    private $tokens;
    ...

    public static function create($tokens) {
        $instance = new static();
        $instance->tokens = $tokens;
        ...
        return $instance;
    }

    public function getTokens() {
        return $this->tokens;
    }

    public static function compare($tokenGroup1, $tokenGroup2) {
        $i = 0;
        $minLength = min(array(count($tokenGroup1->getTokens()), count($tokenGroup2->getTokens())));
        $equalLengths = (count($tokenGroup1->getTokens()) == count($tokenGroup2->getTokens()));
        $comparison = strcmp($tokenGroup1->getTokens()[$i], $tokenGroup2->getTokens()[$i]);
        while ($comparison == 0) {
            $i++;
            if (($i == $minLength) && ($equalLengths == true)) {
                return 0;
            }
            $comparison = strcmp($tokenGroup1->getTokens()[$i], $tokenGroup2->getTokens()[$i]);
        }
        $result = $comparison;
        if ($result < 0)
            return -1;
        elseif ($result > 0)
            return 1;
        else
            return 0;
    }
    ...

}

In the code above $tokens is just a simple array of strings.

Using the method above through usort() for an array of TestTokenGroup consisting of around 40k objects takes ~2secs.

Is there a sensible way to speed that up? Where is the bottleneck here?

EDIT: Added the getTokens() method I initially forgot to include.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

dongwen7730 2015-11-02 19:17

关注

You know that objects are "pass by reference", and arrays are "pass by value"?

If getTokens() returns $this->tokens, the array is copied every time you invoke that method.

Try accessing $tokens directly via $tokenGroup1->tokens. You could also use references (&) although returning a reference doesn't work in all PHP versions.

Alternatively, make one copy only:

$tokens1 = $tokenGroup1->getTokens();
$tokens2 = $tokenGroup2->getTokens();

Even if each token group is relatively small, it will save at least 40000 * ( 6 + $average_token_group_length * 2) array copies.

UPDATE

I've benchmarked OP's code (removing the ... lines) using:

function gentokens() {
        $ret = [];
        for ( $i=0; $i< 3; $i++)
        {
                $str = "";
                for ( $x = rand(0,3); $x < 10; $x ++ )
                        $str .= chr( rand(0,25) + ord('a') );
                $ret[] = $str;
        }
        return $ret;
}


$start = microtime(true);

$array = [];    // this will hold the TestTokenGroup instances
$dummy = "";    // this will hold the tokens, space-separated and newline-separated
$dummy2= [];    // this will hold the space-concatenated strings

for ( $i=0; $i < 40000; $i++)
{
        $array[] = TestTokenGroup::create( $t = gentokens() );

        $dummy   .= implode(' ', $t ) . "
";
        $dummy2[] = implode(' ', $t );
}

// write a test file to benchmark GNU sort:
file_put_contents("sort-data.txt", $dummy);

$inited = microtime(true);
printf("init: %f s
", ($inited-$start));

usort( $array, [ 'TestTokenGroup', 'compare'] );

$sorted = microtime(true);
printf("sort: %f s
", ($sorted-$inited));

usort( $dummy2, 'strcmp' );

$sorted2 = microtime(true);
printf("sort: %f s
", ($sorted2-$sorted));

With the following results:

init: 0.359329 s    // for generating 40000 * 3 random strings and setup
sort: 1.012096 s    // for the TestTokenGroup::compare
sort: 0.120583 s    // for the 'strcmp' compare

And, running time sort sort-data.txt > /dev/null yields

.052 u  (user-time, in seconds).

optimisation 1: remove array copies

replacing ->getTokens() with ->tokens yields (I'll only list the TestTokenGroup::compare results):

sort: 0.832794 s

Optimisation 2: remove redundant array() in min

Changing the $minlength line to:

$minLength = min(count($tokenGroup1->tokens), count($tokenGroup2->tokens));

gives

sort: 0.779134 s

Optimisation 3: Only call count once for each tokenGroup

    $count1 = count($tokenGroup1->tokens);
    $count2 = count($tokenGroup2->tokens);
    $minLength = min($count1, $count2);
    $equalLengths = ($count1 == $count2);

gives

sort: 0.679649 s

Alternative approach

The fastest sort so far is strcmp( $stringarray, 'strcmp' ): 0.12s - still twice as slow as GNU sort, but the latter only does one thing, and does it well.

So, to sort the TokenGroups efficiently we need to construct sort key consisting of a simple string. We can use \0 as a delimiter for the tokens, and we don't have to worry about them being equal length, because as soon as one character is different, the compare aborts.

Here's the implementation:

$arr2 = [];
foreach ( $array as $o )
  $arr2[ implode("\0", $o->getTokens() ) ] = $o;

$init2 = microtime(true);
printf("init2: %f s
", ($init2-$sorted2));

uksort( $arr2, 'strcmp' );

$sorted3 = microtime(true);
printf("sort: %f s
", ($sorted3-$init2));

and here the results:

init2: 0.125939 s
sort: 0.104717 s

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

PHP中的对象比较和数组排序 php
2015-11-02 19:04

回答 1 已采纳 You know that objects are "pass by reference", and arrays are "pass by value"? If getTokens() re
在PHP中按字母顺序排序数组 php
2018-11-03 11:40

回答 2 已采纳 So what solve my problem is @LeoTahk 's suggestion of adding ksort(). Final code: <div cl
我如何在PHP中按日期排序数组 php
2017-01-09 08:17

回答 4 已采纳 For you string is dd/mm/yy type, cannot directly used by strtotime or date_create. You can use Dat
PHP 数组排序
2021-01-03 05:26

PHP 数组排序数组中的元素可以按字母或数字顺序进行降序或升序排列。 PHP – 数组排序函数在本章中，我们将一一介绍下列 PHP 数组排序函数： sort() – 对数组进行升序排列 rsort() – 对数组进行降序排列 a...
PHP数组排序数据 php
2017-05-04 10:50

回答 4 已采纳 Just map min to your array: $o = array_map('min', $array); Here's a demo
PHP按文件名中的数字排序数组 html php
2017-06-20 18:45

回答 3 已采纳 usort is a php function to sort array using values. usort needs a callback function that receives
如何在PHP中对多维数组中的多维数组进行排序 php
2019-07-18 05:25

回答 1 已采纳 Use array_walk() and usort(). array_walk() is to make an iteration over the array and usort() is t
php数组中包含中文的排序方法
2020-10-25 20:48

主要介绍了php数组中包含中文的排序方法,需要的朋友可以参考下
在PHP 7中对关联数组进行排序[重复] php
2018-06-02 08:11

回答 2 已采纳 You can use usort $user1 = array('username' => 'test1', 'score' => 2000, 'someotherdata' =&
如何在新的数组php中保持排序顺序 php
2018-09-13 14:11

回答 1 已采纳 Try like this php function with array_multisort: <?php $ar1 = array(10, 100, 100, 0); $ar2 = a
在PHP中对多层和多维数组进行排序 php
2017-01-21 10:34

回答 1 已采纳 Assuming you're using PHP7, and as you said you want to order by year descending, then for each ar
php 数组对象排序,数组-PHP中的对象排序
2021-03-11 07:34

chenzj(郑升和)的博客数组-PHP中的对象排序用PHP排序对象的一种优雅方法是什么？我很乐意完成与此类似的事情。$sortedObjectArary = sort($unsortedObjectArray, $Object->weight);基本上指定要排序的数组以及要排序的字段。我研究...
在PHP中任意对数组中的对象进行排序 php
2018-04-12 20:29

回答 3 已采纳 If you have an array that defines the sort order, like: $order = ['col-one', 'col-four', 'col-thr
php中的一些数组排序方法分享
2020-12-18 20:45

外部排序（因数据量大，需借助外部存储进行排序）：包括合并排序、直接合并排序【冒泡排序：从后向前，依次比较相邻元素的排序码，若发现逆序则交换，一轮结束后，再来一轮，直到所有相邻数无逆序，即按顺序排完】 ...
PHP实现的自定义数组排序函数与排序类示例
2020-12-19 20:52

本文实例讲述了PHP实现的自定义数组排序函数与排序类。分享给大家供大家参考，具体如下： /* * 二维数组自定义排序函数 * uasort($arr,function_name) * **/ $arr = array( array('a'=>1,'b'=>'c'), array('a'=>4...
没有解决我的问题, 去提问

悬赏问题

¥15 求MCSCANX 帮助
¥15 机器学习训练相关模型
¥15 Todesk 远程写代码 anaconda jupyter python3
¥15 我的R语言提示去除连锁不平衡时clump_data报错，图片以下所示，卡了好几天了，苦恼不知道如何解决，有人帮我看看怎么解决吗？
¥15 在获取boss直聘的聊天的时候只能获取到前40条聊天数据
¥20 关于URL获取的参数，无法执行二选一查询
¥15 液位控制，当液位超过高限时常开触点59闭合，直到液位低于低限时，断开
¥15 marlin编译错误，如何解决？
¥15 VUE项目怎么运行，系统打不开
¥50 pointpillars等目标检测算法怎么融合注意力机制

码龄粉丝数原力等级 --

PHP中的对象比较和数组排序

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

PHP中的对象比较和数组排序

1条回答 默认 最新

悬赏问题

1条回答默认最新