doulu2011 2014-01-23 12:27
浏览 72
已采纳

从文件中的无序列表创建嵌套数组

I'm trying to convert an old HTML Site to a new CMS. To get the correct menu hierachy (with varying depth) I want to read all the files with PHP and extract/parse the menu (nested unordered lists) into an associative array

root.html
<ul id="menu">
  <li class="active">Start</li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <li><a href="file2.html">Sub2</a></li>
  </ul>
</ul>

file1.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li class="active">Sub1</li>
    <ul>
      <li><a href="file3.html">SubSub1</a></li>
      <li><a href="file4.html">SubSub2</a></li>
      <li><a href="file5.html">SubSub3</a></li>
      <li><a href="file6.html">SubSub4</a></li>
    </ul>
  </ul>
</ul>

file3.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <ul>
      <li class="active">SubSub1</li>
      <ul>
        <li><a href="file7.html">SubSubSub1</a></li>
        <li><a href="file8.html">SubSubSub2</a></li>
        <li><a href="file9.html">SubSubSub3</a></li>
      </ul>
    </ul>
  </ul>
</ul>

file4.html
<ul id="menu">
  <li><a href="root.html">Start</a></li>
  <ul>
    <li><a href="file1.html">Sub1</a></li>
    <ul>
      <li><a href="file3.html">SubSub1</a></li>
      <li class="active">SubSub2</li>
      <li><a href="file5.html">SubSub3</a></li>
      <li><a href="file6.html">SubSub4</a></li>
    </ul>
  </ul>
</ul>

I would like to loop through all files, extract 'id="menu"' and create an array like this (or similar) while keeping the hierarchy and file information

Array 
  [file] => root.html
  [child] => Array 
    [Sub1] => Array 
      [file] => file1.html
      [child] => Array  
        [SubSub1] => Array 
          [file] => file3.html
          [child] => Array 
            [SubSubSub1] => Array 
              [file] => file7.html
            [SubSubSub2] => Array 
              [file] => file8.html                      
            [SubSubSub3] => Array
              [file] => file9.html
        [SubSub2] => Array
          [file] => file4.html
        [SubSub3] => Array 
          [file] => file5.html
        [SubSub4] => Array 
          [file] => file6.html
    [Sub2] => Array
      [file] => file2.html 

With the help of the PHP Simple HTML DOM Parser libray I successfully read the file and extracted the menu

$html = file_get_html($file);
foreach ($html->find("ul[id=menu]") as $ul) {
  ..
}

To only parse the active section of the menu (leaving out the links to got 1 or more levels up) I used

$ul->find("ul",-1)

which finds the last ul inside the outer ul. This works great for a single file.

But I'm having trouble to loop through all the files/menus and keep the parent/child information because each menu has a different depth.

Thanks for all suggestions, tips and help!

  • 写回答

2条回答 默认 最新

  • dongquechan4414 2014-01-23 12:41
    关注

    Edit: OK, this was not so easy after all :)

    By the way, this library is really an excellent tool. Kudos to the guys who wrote it.

    Here is one possible solution:

    class menu_parse {
    
        static $missing = array(); // list of missing files
    
        static private $files = array(); // list of source files to process
    
        // initiate menu parsing
        static function start ($file)
        {
            // start with root file
            self::$files[$file] = 1;
    
            // parse all source files
            for ($res=array(); current(self::$files); next(self::$files))
            {
                // get next file name
                $file = key(self::$files);
    
                // parse the file
                if (!file_exists ($file))
                {
                    self::$missing[$file] = 1;
                    continue;
                }
                $html = file_get_html ($file);
    
                // get menu root (if any)
                $root = $html->find("ul[id=menu]",0);
                if ($root) self::menu ($root, $res);
            }
    
            // reorder missing files array
            self::$missing = array_keys (self::$missing);
    
            // that's all folks
            return $res;
        }
    
        // parse a menu at a given level
        static private function menu ($menu, &$res)
        {
            foreach ($menu->children as $elem)
            {
                switch ($elem->tag)
                {
                case "li" : // name and possibly source file of a menu
    
                    // grab menu name
                    $name = $elem->plaintext;
    
                    // see if we can find a link to the menu file
                    $link = $elem->children(0);
                    if ($link && $link->tag == 'a')
                    {
                        // found the link
                        $file = $link->href;
                        $res[$name]->file = $file;
    
                        // add the source file to the processing list
                        self::$files[$file] = 1;
                    }
                    break;
    
                case "ul" : // go down one level to grab items of the current menu
                    self::menu ($elem, $res[$name]->childs);
                }   
            }
        }
    }
    

    Usage:

    // The result will be an array of menus indexed by item names.
    //
    // Each menu will be an object with 2 members
    // - file   -> source file of the menu
    // - childs -> array of menu subtitems
    //
    $res = menu_parse::start ("root.html");
    
    // parse_menu::$missing will contain all the missing files names
    
    echo "Result : <pre>";
    print_r ($res);
    echo "</pre><br>missing files:<pre>";
    print_r (menu_parse::$missing);
    echo "</pre>";
    

    Ouput of your test case:

    Array
    (
      [Start] => stdClass Object
        (
          [childs] => Array
            (
              [Sub1] => stdClass Object
                (
                  [file] => file1.html
                  [childs] => Array
                    (
                      [SubSub1] => stdClass Object
                        (
                          [file] => file3.html
                          [childs] => Array
                            (
                              [SubSubSub1] => stdClass Object
                                (
                                  [file] => file7.html
                                )
                              [SubSubSub2] => stdClass Object
                                (
                                  [file] => file8.html
                                )
                              [SubSubSub3] => stdClass Object
                                (
                                  [file] => file9.html
                                )
                            )
                        )
                      [SubSub2] => stdClass Object
                        (
                          [file] => file3.html
                        )
                      [SubSub3] => stdClass Object
                        (
                          [file] => file5.html
                        )
                      [SubSub4] => stdClass Object
                        (
                          [file] => file6.html
                        )
                    )
                )
              [Sub2] => stdClass Object
                (
                  [file] => file2.html
                )
            )
          [file] => root.html
        )
    )
    
    missing files: Array
    (
        [0] => file2.html
        [1] => file5.html
        [2] => file6.html
        [3] => file7.html
        [4] => file8.html
        [5] => file9.html
    )
    

    Notes:

    • The code assumes all item names are unique inside a given menu.

    You could modify the code to have the (sub)menus as an array with numeric indexes and names as properties (so that two items with the same name would not overwrite each other), but that would complicate the structure of the result.

    Should such name duplication occur, the best solution would be to rename one of the items, IMHO.

    • The code also assume there is only one root menu.

    It could be modified to handle more than one, but that does not make much sense IMHO (it would mean a root menu ID duplication, which would likely cause trouble to the JavaScript trying to process it in the first place).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 谁能帮我挨个解读这个php语言编的代码什么意思?
  • ¥15 win10权限管理,限制普通用户使用删除功能
  • ¥15 minnio内存占用过大,内存没被回收(Windows环境)
  • ¥65 抖音咸鱼付款链接转码支付宝
  • ¥15 ubuntu22.04上安装ursim-3.15.8.106339遇到的问题
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥15 网络通信安全解决方案
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面