duanbi8529 2016-05-12 01:55
浏览 9

too long

This question is an exact duplicate of:

I need to remove the div and td tags in order to extract out the content between to insert into database. However to some constraints I have to use regex and not xpath or DOM Document to extract out the content. Need help! Thanks

 <tr class = "student_information" >
            <div class="admin"><td>141234U</td></div>
            <div class="name"><td>Tan Ping Ping</td></div>
            <div class="hp"><td>82222222</td></div>
            <div class="email"><td>141234U@mymail.nyp.edu.sg</td></div>
        </tr>
                    <tr class = "student_information" >
            <div class="admin"><td>132458Q</td></div>
            <div class="name"><td>Tan Rui</td></div>
            <div class="hp"><td>86339557</td></div>
            <div class="email"><td>132458Q@hotmail.com</td></div>

 Output: 

 141234U
 Tan Ping Ping
 82222222
 141234U@mymail.nyp.edu.sg

 132458Q
 Tan Rui
 86339557
 132458Q@hotmail.com
</div>
  • 写回答

1条回答 默认 最新

  • douzao2590 2016-05-12 02:03
    关注

    However to some constraints I have to use regex and not xpath or DOM Document to extract out the content

    Based on the above, you can use this regex: (?<=>)([\w .@]+)(?=<), i.e.:

    $str = <<< EOF
     <tr class = "student_information" >
                <div class="admin"><td>141234U</td></div>
                <div class="name"><td>Tan Ping Ping</td></div>
                <div class="hp"><td>82222222</td></div>
                <div class="email"><td>141234U@mymail.nyp.edu.sg</td></div>
            </tr>
                        <tr class = "student_information" >
                <div class="admin"><td>132458Q</td></div>
                <div class="name"><td>Tan Rui</td></div>
                <div class="hp"><td>86339557</td></div>
                <div class="email"><td>132458Q@hotmail.com</td></div>
    EOF;
    
    preg_match_all('/(?<=>)([\w .@]+)(?=<)/', $str, $result, PREG_PATTERN_ORDER);
    foreach($result[1] as $match){
    echo $match."
    ";
    }
    

    Output:

    141234U
    Tan Ping Ping
    82222222
    141234U@mymail.nyp.edu.sg
    132458Q
    Tan Rui
    86339557
    132458Q@hotmail.com
    

    Regex Explanation:

    (?<=>)([\w.@]+)(?=<)
    
    Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=>)»
       Match the character “>” literally «>»
    Match the regex below and capture its match into backreference number 1 «([\w.@]+)»
       Match a single character present in the list below «[\w.@]+»
          Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
          A “word character” (Unicode; any letter or ideograph, any number, underscore) «\w»
          A single character from the list “.@” «.@»
    Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=<)»
       Match the character “<” literally «<»
    
    评论

报告相同问题?

悬赏问题

  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算
  • ¥15 powerbuilder中的datawindow数据整合到新的DataWindow
  • ¥20 有人知道这种图怎么画吗?
  • ¥15 pyqt6如何引用qrc文件加载里面的的资源
  • ¥15 安卓JNI项目使用lua上的问题
  • ¥20 RL+GNN解决人员排班问题时梯度消失
  • ¥60 要数控稳压电源测试数据
  • ¥15 能帮我写下这个编程吗
  • ¥15 ikuai客户端l2tp协议链接报终止15信号和无法将p.p.p6转换为我的l2tp线路