dqfkd82886 2017-03-05 01:17
浏览 95
已采纳

PHP中可扩展的Trie实现

Following this tutorial I met the Trie data structure. Since recently I've been programming in PHP I tried to solve the lecture's problem with that. I was able to achieve correct answers, but only for smaller inputs (Input #10 is a 2,82 MB file). Obviously, my algorithm is not scaling well. It also exceeds the default 128 MB memory limit of PHP.

My algorithm

There is a root node stored in Trie. Every node has a "children" member. I use the standard PHP array to store the children. A child key represents a character (currently I am creating a new node for every character, a-z lowercase, mapping to 0-25), a child value is a reference to another node.

The "weight" member that every nodes has is there because of the problem. I would like to optimize my code, (or even rewrite it from stratch using a different approach) so that it can pass the tests for big inputs.

I'm interested in a solution to make this data structure work in PHP with big inputs, if possible.

My code

TrieNode class stores the tree hierarchy.

class TrieNode {
    // weight is needed for the given problem
    public $weight;
    /* TrieNode children, 
    * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
    * where 0 stands for 'a', 1 for 'c'
    * and TrieNode objects are references to other TrieNodes.
    */
    private $children;

    function __construct($weight, $children) {
        $this->weight = $weight;
        $this->children = $children;
    }

    /** map lower case english letters to 0-25 */
    static function getAsciiValue($char) {
        return intval(ord($char)) - intval(ord('a'));
    }

    function addChild($char, $node) {
        if (!isset($this->children)) {
            $this->children = [];
        }
        $this->children[self::getAsciiValue($char)] = $node;
    }

    function isChild($char) {
        return isset($this->children[self::getAsciiValue($char)]);
    }

    function getChild($char) {
        return $this->children[self::getAsciiValue($char)];
    }

    function isLeaf() {
        return empty($this->children);
    }
}

Trie class stores the root TrieNode. It can insert and query nodes.

class Trie {
    /* root TrieNode stores the first characters */
    private $root;

    function __construct() {
        $this->root = new TrieNode(-1, []);
    }

    function insert($string, $weight) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = $string[$i];
            if(!$currentNode->isChild($char)) {
                $n = new TrieNode($weight, null);
                $currentNode->addChild($char, $n);
            }
            $currentNode->weight = max($weight, $currentNode->weight);
            $currentNode = $currentNode->getChild($char);
        }
    }

    function getNode($string) {
        $currentNode = $this->root;
        $l = strlen($string);
        for ($i = 0; $i < $l; $i++) {
            $char = $string[$i];
            if ($currentNode->isLeaf() || !$currentNode->isChild($char)) {
                return null;
            }
            $currentNode = $currentNode->getChild($char);
        }
        return $currentNode;
    }

    function getWeight($string) {
        $node = $this->getNode($string);
        return is_null($node) ? -1 : $node->weight;
    }
}

Test code. Parses input and calls the Trie object.

//MAIN / TEST

/*
In case the problem page is down:

e.g.
INPUT
2 1
hackerearth 10
hackerrank 9
hacker

OUTPUT
10

where 2 is the number of inserts, 1 is the number of queries
"string number" is the string to insert and its "weight"
"hacker" is the string to query
10 is maximum the weight of the queried string (hacker -> 10)
*/

$trie = new Trie();
$handle = fopen('test.txt', 'r');
//$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
    list($s, $weight) = fscanf($handle, "%s %d");
    $trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
    $query = trim(strval(fgets($handle)));
    echo $trie->getWeight($query) . PHP_EOL;
}
fclose($handle);

Fail

The algorithm fails 6 tests out of 16

  • 写回答

2条回答 默认 最新

  • douan4359 2017-06-23 10:20
    关注

    Below is the code with following optimizations -

    Removed all unnecessary condition checks like

    1. No need to check is node is a leaf because if node does not have a child a specified char then it does not matter if it is a leaf or not.
    2. No need to check if {children} is initialized every time you add a child node. Removed this check initialized {children} to empty array in constructor itself.

    Removed function to {getAsciiValue} instead using a simple associative array as. Also changing a {char} to ascii value has been moved from TrieNode to Trie class so that we don't need to convert it multiple times

    After these optimization i came up following minimal solution, but this also can not pass the test #10. After reading about array in PHP I got to know that PHP does not implement array like other compiled languages, instead any array in PHP is just an ordered hash map and because of this array does not support constant time operations. https://stackoverflow.com/a/4904071/8203131

    Also using SplFixedArray but did not help because it is an object and has cost of instantiation. It could have helped if tried using a large array to store whole Trie. You can try implementing a solution to using SplFixedArray to store whole Trie and check if you can get it to pass test #10.

    <?php
    
    /*
     * Read input from stdin and provide input before running code
    
    fscanf(STDIN, "%s
    ", $name);
    echo "Hi, ".$name;
    
    */
    
    class TrieNode {
        // weight is needed for the given problem
        public $weight;
        /* TrieNode children, 
        * e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
        * where 0 stands for 'a', 2 for 'c'
        * and TrieNode objects are references to other TrieNodes.
        */
        private $children;
    
        function __construct($weight) {
            $this->weight = $weight;
            $this->children = [];
        }
    
        function addChild($char, $node) {
            $this->children[$char] = $node;
        }
    
        function isChild($char) {
            return isset($this->children[$char]);
        }
    
        function getChild($char) {
            return $this->children[$char];
        }
    }
    
    class Trie {
        /* root TrieNode stores the first characters */
        private $root;
    
        function __construct() {
            $this->root = new TrieNode(-1);
        }
    
        static $asciiValues = array(
            "a" => 0,
            "b" => 1,
            "c" => 2,
            "d" => 3,
            "e" => 4,
            "f" => 5,
            "g" => 6,
            "h" => 7,
            "i" => 8,
            "j" => 9,
            "k" => 10,
            "l" => 11,
            "m" => 12,
            "n" => 13,
            "o" => 14,
            "p" => 15,
            "q" => 16,
            "r" => 17,
            "s" => 18,
            "t" => 19,
            "u" => 20,
            "v" => 21,
            "w" => 22,
            "x" => 23,
            "y" => 24,
            "z" => 25
        );
    
        function insert($string, $weight) {
            $currentNode = $this->root;
            $l = strlen($string);
            for ($i = 0; $i < $l; $i++) {
                $char = self::$asciiValues[$string[$i]];
                $currentNode->weight = max($weight, $currentNode->weight);
                if($currentNode->isChild($char)) {
                    $childNode = $currentNode->getChild($char);
                } else {
                    $childNode = new TrieNode($weight);
                    $currentNode->addChild($char, $childNode);
                }
                $currentNode = $childNode;
            }
        }
    
        function getNodeWeight($string) {
            $currentNode = $this->root;
            $l = strlen($string);
            for ($i = 0; $i < $l; $i++) {
                $char = self::$asciiValues[$string[$i]];
                if (!$currentNode->isChild($char)) {
                    return -1;
                }
                $currentNode = $currentNode->getChild($char);
            }
            return $currentNode->weight;
        }
    }
    
    $trie = new Trie();
    //$handle = fopen('test.txt', 'r');
    $handle = STDIN; // <- this is for the online judge
    list($n, $q) = fscanf($handle, "%d %d");
    for ($i = 0; $i < $n; $i++) { // insert data
        list($s, $weight) = fscanf($handle, "%s %d");
        $trie->insert($s, $weight);
    }
    for ($i = 0; $i < $q; $i++) { // query data
        //$query = trim(strval(fgets($handle)));
        $query = trim(strval(fgets($handle)));
        echo $trie->getNodeWeight($query) . PHP_EOL;
    }
    fclose($handle);
    
    
    ?>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 sub地址DHCP问题
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办
  • ¥15 kylin启动报错log4j类冲突
  • ¥15 超声波模块测距控制点灯,灯的闪烁很不稳定,经过调试发现测的距离偏大