douxin2002 2009-12-21 01:12
浏览 27

使用xpath和simplexml只获得相同类型的后续兄弟

I need to parse a html definition list like the following:

<dl>
    <dt>stuff</dt>
        <dd>junk</dd>
        <dd>things</dd>
        <dd>whatnot</dd>
    <dt>colors</dt>
        <dd>red</dd>
        <dd>green</dd>
        <dd>blue</dd>
</dl>

So that I can end up with an associative array like this:

[definition list] =>
    [stuff] =>
        [0] => junk
        [1] => things
        [2] => whatnot
    [colors] =>
        [0] => red
        [1] => green
        [2] => blue

I am using DOMDocument -> loadHTML() to import the HTML string into an object and then simplexml_import_dom() to use the simplexml extensions, specifically xpath.

The problem I'm having is with the XPath syntax for querying all <dd> elements that are consecutive and not broken by a <dt>.

Since <dd> elements are not considered children of <dt> elements, I can't simply loop through a query all dts and query for all dds.

So I'm thinking I have to do a query for the first dd sibling of each dt and then all dd siblings of that first dd.

But I'm not clear from the XPath tutorials if this is possible. Can you say "consecutive matching siblings"? Or am I forced to loop through each child of the original dl and move over any dts and dd as they show up?

  • 写回答

1条回答 默认 最新

  • douxiong0668 2009-12-21 03:37
    关注

    There are certainly ways to find consecutive matching siblings in XPath, but it would be relatively complicated and since you have to process every child anyway you might as well just loop over them as you mentioned. It will be simpler and more efficient than looping over <dt/> then looking for siblings.

    $dl = simplexml_load_string(
        '<dl>
            <dt>stuff</dt>
                <dd>junk</dd>
                <dd>things</dd>
                <dd>whatnot</dd>
            <dt>colors</dt>
                <dd>red</dd>
                <dd>green</dd>
                <dd>blue</dd>
        </dl>'
    );
    
    $list = array();
    foreach ($dl->children() as $child)
    {
        switch (dom_import_simplexml($child)->localName)
        {
            case 'dt':
                $k = (string) $child;
                break;
    
            case 'dd':
                $list[$k][] = (string) $child;
                break;
        }
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)