ds000001 2015-11-02 19:04

已采纳

PHP中的对象比较和数组排序

I have a problem with object comparison in PHP. What seems like a straightforward code actually runs way too slow for my liking and as I am not that advanced in the language I would like some feedback and suggestions regarding the following code:

class TestTokenGroup {
    private $tokens;
    ...

    public static function create($tokens) {
        $instance = new static();
        $instance->tokens = $tokens;
        ...
        return $instance;
    }

    public function getTokens() {
        return $this->tokens;
    }

    public static function compare($tokenGroup1, $tokenGroup2) {
        $i = 0;
        $minLength = min(array(count($tokenGroup1->getTokens()), count($tokenGroup2->getTokens())));
        $equalLengths = (count($tokenGroup1->getTokens()) == count($tokenGroup2->getTokens()));
        $comparison = strcmp($tokenGroup1->getTokens()[$i], $tokenGroup2->getTokens()[$i]);
        while ($comparison == 0) {
            $i++;
            if (($i == $minLength) && ($equalLengths == true)) {
                return 0;
            }
            $comparison = strcmp($tokenGroup1->getTokens()[$i], $tokenGroup2->getTokens()[$i]);
        }
        $result = $comparison;
        if ($result < 0)
            return -1;
        elseif ($result > 0)
            return 1;
        else
            return 0;
    }
    ...

}

In the code above $tokens is just a simple array of strings.

Using the method above through usort() for an array of TestTokenGroup consisting of around 40k objects takes ~2secs.

Is there a sensible way to speed that up? Where is the bottleneck here?

EDIT: Added the getTokens() method I initially forgot to include.

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

dongwen7730 2015-11-02 19:17

关注

You know that objects are "pass by reference", and arrays are "pass by value"?

If getTokens() returns $this->tokens, the array is copied every time you invoke that method.

Try accessing $tokens directly via $tokenGroup1->tokens. You could also use references (&) although returning a reference doesn't work in all PHP versions.

Alternatively, make one copy only:

$tokens1 = $tokenGroup1->getTokens();
$tokens2 = $tokenGroup2->getTokens();

Even if each token group is relatively small, it will save at least 40000 * ( 6 + $average_token_group_length * 2) array copies.

UPDATE

I've benchmarked OP's code (removing the ... lines) using:

function gentokens() {
        $ret = [];
        for ( $i=0; $i< 3; $i++)
        {
                $str = "";
                for ( $x = rand(0,3); $x < 10; $x ++ )
                        $str .= chr( rand(0,25) + ord('a') );
                $ret[] = $str;
        }
        return $ret;
}


$start = microtime(true);

$array = [];    // this will hold the TestTokenGroup instances
$dummy = "";    // this will hold the tokens, space-separated and newline-separated
$dummy2= [];    // this will hold the space-concatenated strings

for ( $i=0; $i < 40000; $i++)
{
        $array[] = TestTokenGroup::create( $t = gentokens() );

        $dummy   .= implode(' ', $t ) . "
";
        $dummy2[] = implode(' ', $t );
}

// write a test file to benchmark GNU sort:
file_put_contents("sort-data.txt", $dummy);

$inited = microtime(true);
printf("init: %f s
", ($inited-$start));

usort( $array, [ 'TestTokenGroup', 'compare'] );

$sorted = microtime(true);
printf("sort: %f s
", ($sorted-$inited));

usort( $dummy2, 'strcmp' );

$sorted2 = microtime(true);
printf("sort: %f s
", ($sorted2-$sorted));

With the following results:

init: 0.359329 s    // for generating 40000 * 3 random strings and setup
sort: 1.012096 s    // for the TestTokenGroup::compare
sort: 0.120583 s    // for the 'strcmp' compare

And, running time sort sort-data.txt > /dev/null yields

.052 u  (user-time, in seconds).

optimisation 1: remove array copies

replacing ->getTokens() with ->tokens yields (I'll only list the TestTokenGroup::compare results):

sort: 0.832794 s

Optimisation 2: remove redundant array() in min

Changing the $minlength line to:

$minLength = min(count($tokenGroup1->tokens), count($tokenGroup2->tokens));

gives

sort: 0.779134 s

Optimisation 3: Only call count once for each tokenGroup

    $count1 = count($tokenGroup1->tokens);
    $count2 = count($tokenGroup2->tokens);
    $minLength = min($count1, $count2);
    $equalLengths = ($count1 == $count2);

gives

sort: 0.679649 s

Alternative approach

The fastest sort so far is strcmp( $stringarray, 'strcmp' ): 0.12s - still twice as slow as GNU sort, but the latter only does one thing, and does it well.

So, to sort the TokenGroups efficiently we need to construct sort key consisting of a simple string. We can use \0 as a delimiter for the tokens, and we don't have to worry about them being equal length, because as soon as one character is different, the compare aborts.

Here's the implementation:

$arr2 = [];
foreach ( $array as $o )
  $arr2[ implode("\0", $o->getTokens() ) ] = $o;

$init2 = microtime(true);
printf("init2: %f s
", ($init2-$sorted2));

uksort( $arr2, 'strcmp' );

$sorted3 = microtime(true);
printf("sort: %f s
", ($sorted3-$init2));

and here the results:

init2: 0.125939 s
sort: 0.104717 s

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

php数组中包含中文的排序方法
2020-10-25 20:48

在处理PHP数组中的中文排序问题时，首先需要理解编码的概念。不同的编码方式会对排序产生不同的影响。由于UTF-8编码采用的是Unicode编码表，它并不按照拼音顺序来排序中文字符，而是根据字符的码点值进行排序。这就...
PHP 数组排序
2021-01-03 05:26

PHP 数组排序数组中的元素可以按字母或数字顺序进行降序或升序排列。 PHP – 数组排序函数在本章中，我们将一一介绍下列 PHP 数组排序函数： sort() – 对数组进行升序排列 rsort() – 对数组进行降序排列 a...
php中的一些数组排序方法分享
2020-12-18 20:45

在PHP编程语言中，数组排序是常见的操作，用于组织和管理数据。本文将详细讨论两种主要的排序方式：内部排序和外部排序，并提供相应的PHP代码示例。内部排序，也称为在线排序，是指数据全部加载到内存中进行排序。...
php 数组对象排序,数组-PHP中的对象排序
2021-03-11 07:34

chenzj(郑升和)的博客数组-PHP中的对象排序用PHP排序对象的一种优雅方法是什么？我很乐意完成与此类似的事情。$sortedObjectArary = sort($unsortedObjectArray, $Object->weight);基本上指定要排序的数组以及要排序的字段。我研究...
php 二维数组时间排序实现代码
2020-10-21 02:14

在PHP中，尽管PHP提供了一些内置的数组排序函数，例如sort()、asort()、arsort()、ksort()、krsort()等，但这些函数都是用于对数组的键或者值进行排序，而不支持直接对二维数组中的特定元素（比如数组里的某个字段）...
php插入排序法实现数组排序实例
2020-10-24 16:41

这种排序方法在实现数组排序时，适用于小型数据集，因为其平均时间复杂度和最坏情况下的时间复杂度均为O(n^2)，相较于快速排序等复杂度为O(n log n)的算法效率较低。然而，插入排序在实现简单、就地排序（不需要额外...
PHP中数组的分组排序实例
2020-10-25 21:03

`array_multisort()` 是PHP内置的一个用于多维数组排序的函数。它可以接受多个数组作为参数，按照指定的排序方式对数组进行排序。 ```php $a = $c = array(); foreach ($list as $val) { $a[] = $val[0]; // A列 ...
PHP多维数组排序array详解
2020-10-18 23:26

首先，我们要理解PHP数组排序的核心函数`array_multisort()`。该函数能够对多个数组或多维数组进行排序，通过它可以实现复杂条件下的排序操作。`array_multisort()`函数接受一个或多个数组作为参数，并且可以接受...
php 数组排序中文,PHP 数组排序
2021-04-22 08:58

showtime911的博客 PHP 数组排序数组中的元素可以按字母或数字顺序进行降序或升序排列。PHP - 数组排序函数在本章中，我们将一一介绍下列 PHP 数组排序函数：sort() - 对数组进行升序排列rsort() - 对数组进行降序排列asort() - 根据...
PHP实现的自定义数组排序函数与排序类示例
2020-12-19 20:52

在PHP中，`uasort()` 函数允许我们根据自定义比较函数对关联数组进行排序。在给定的例子中，我们有一个二维数组 `$arr`，每个子数组包含键 `'a'` 和 `'b'`。为了按照 `'b'` 键的值对数组进行排序，我们可以定义一个...
没有解决我的问题, 去提问

码龄粉丝数原力等级 --

PHP中的对象比较和数组排序

1条回答默认最新

码龄粉丝数原力等级 --

PHP中的对象比较和数组排序

1条回答 默认 最新

1条回答默认最新