doulu2011 2014-01-23 12:27
浏览 72
已采纳

从文件中的无序列表创建嵌套数组

I'm trying to convert an old HTML Site to a new CMS. To get the correct menu hierachy (with varying depth) I want to read all the files with PHP and extract/parse the menu (nested unordered lists) into an associative array

root.html
<ul id="menu">
  <li class="active">Start</li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <li><a href="file2.html">Sub2</a></li>
  </ul>
</ul>

file1.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li class="active">Sub1</li>
    <ul>
      <li><a href="file3.html">SubSub1</a></li>
      <li><a href="file4.html">SubSub2</a></li>
      <li><a href="file5.html">SubSub3</a></li>
      <li><a href="file6.html">SubSub4</a></li>
    </ul>
  </ul>
</ul>

file3.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <ul>
      <li class="active">SubSub1</li>
      <ul>
        <li><a href="file7.html">SubSubSub1</a></li>
        <li><a href="file8.html">SubSubSub2</a></li>
        <li><a href="file9.html">SubSubSub3</a></li>
      </ul>
    </ul>
  </ul>
</ul>

file4.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <ul>
      <li><a href="file3.html">SubSub1</a></li>
      <li class="active">SubSub2</li>
      <li><a href="file5.html">SubSub3</a></li>
      <li><a href="file6.html">SubSub4</a></li>
    </ul>
  </ul>
</ul>

I would like to loop through all files, extract 'id="menu"' and create an array like this (or similar) while keeping the hierarchy and file information

Array 
  [file] => root.html
  [child] => Array 
    [Sub1] => Array 
      [file] => file1.html
      [child] => Array  
        [SubSub1] => Array 
          [file] => file3.html
          [child] => Array 
            [SubSubSub1] => Array 
              [file] => file7.html
            [SubSubSub2] => Array 
              [file] => file8.html                      
            [SubSubSub3] => Array
              [file] => file9.html
        [SubSub2] => Array
          [file] => file4.html
        [SubSub3] => Array 
          [file] => file5.html
        [SubSub4] => Array 
          [file] => file6.html
    [Sub2] => Array
      [file] => file2.html 

With the help of the PHP Simple HTML DOM Parser libray I successfully read the file and extracted the menu

$html = file_get_html($file);
foreach ($html->find("ul[id=menu]") as $ul) {
  ..
}

To only parse the active section of the menu (leaving out the links to got 1 or more levels up) I used

$ul->find("ul",-1)

which finds the last ul inside the outer ul. This works great for a single file.

But I'm having trouble to loop through all the files/menus and keep the parent/child information because each menu has a different depth.

Thanks for all suggestions, tips and help!

  • 写回答

2条回答 默认 最新

  • dongquechan4414 2014-01-23 12:41
    关注

    Edit: OK, this was not so easy after all :)

    By the way, this library is really an excellent tool. Kudos to the guys who wrote it.

    Here is one possible solution:

    class menu_parse {
    
        static $missing = array(); // list of missing files
    
        static private $files = array(); // list of source files to process
    
        // initiate menu parsing
        static function start ($file)
        {
            // start with root file
            self::$files[$file] = 1;
    
            // parse all source files
            for ($res=array(); current(self::$files); next(self::$files))
            {
                // get next file name
                $file = key(self::$files);
    
                // parse the file
                if (!file_exists ($file))
                {
                    self::$missing[$file] = 1;
                    continue;
                }
                $html = file_get_html ($file);
    
                // get menu root (if any)
                $root = $html->find("ul[id=menu]",0);
                if ($root) self::menu ($root, $res);
            }
    
            // reorder missing files array
            self::$missing = array_keys (self::$missing);
    
            // that's all folks
            return $res;
        }
    
        // parse a menu at a given level
        static private function menu ($menu, &$res)
        {
            foreach ($menu->children as $elem)
            {
                switch ($elem->tag)
                {
                case "li" : // name and possibly source file of a menu
    
                    // grab menu name
                    $name = $elem->plaintext;
    
                    // see if we can find a link to the menu file
                    $link = $elem->children(0);
                    if ($link && $link->tag == 'a')
                    {
                        // found the link
                        $file = $link->href;
                        $res[$name]->file = $file;
    
                        // add the source file to the processing list
                        self::$files[$file] = 1;
                    }
                    break;
    
                case "ul" : // go down one level to grab items of the current menu
                    self::menu ($elem, $res[$name]->childs);
                }   
            }
        }
    }
    

    Usage:

    // The result will be an array of menus indexed by item names.
    //
    // Each menu will be an object with 2 members
    // - file   -> source file of the menu
    // - childs -> array of menu subtitems
    //
    $res = menu_parse::start ("root.html");
    
    // parse_menu::$missing will contain all the missing files names
    
    echo "Result : <pre>";
    print_r ($res);
    echo "</pre><br>missing files:<pre>";
    print_r (menu_parse::$missing);
    echo "</pre>";
    

    Ouput of your test case:

    Array
    (
      [Start] => stdClass Object
        (
          [childs] => Array
            (
              [Sub1] => stdClass Object
                (
                  [file] => file1.html
                  [childs] => Array
                    (
                      [SubSub1] => stdClass Object
                        (
                          [file] => file3.html
                          [childs] => Array
                            (
                              [SubSubSub1] => stdClass Object
                                (
                                  [file] => file7.html
                                )
                              [SubSubSub2] => stdClass Object
                                (
                                  [file] => file8.html
                                )
                              [SubSubSub3] => stdClass Object
                                (
                                  [file] => file9.html
                                )
                            )
                        )
                      [SubSub2] => stdClass Object
                        (
                          [file] => file3.html
                        )
                      [SubSub3] => stdClass Object
                        (
                          [file] => file5.html
                        )
                      [SubSub4] => stdClass Object
                        (
                          [file] => file6.html
                        )
                    )
                )
              [Sub2] => stdClass Object
                (
                  [file] => file2.html
                )
            )
          [file] => root.html
        )
    )
    
    missing files: Array
    (
        [0] => file2.html
        [1] => file5.html
        [2] => file6.html
        [3] => file7.html
        [4] => file8.html
        [5] => file9.html
    )
    

    Notes:

    • The code assumes all item names are unique inside a given menu.

    You could modify the code to have the (sub)menus as an array with numeric indexes and names as properties (so that two items with the same name would not overwrite each other), but that would complicate the structure of the result.

    Should such name duplication occur, the best solution would be to rename one of the items, IMHO.

    • The code also assume there is only one root menu.

    It could be modified to handle more than one, but that does not make much sense IMHO (it would mean a root menu ID duplication, which would likely cause trouble to the JavaScript trying to process it in the first place).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?