dongmi0760 2011-05-04 00:39
浏览 42

如何解析PHP页面中的文本

Hi i have a web page that returns part of a wiki link and i want to parse certain elements out of it. the code of the page is as follows -

Array
(
    [parse] => Array
        (
            [text] => Array
                (
                    [*] => <div class="dablink">This article is about sports known as football.  For the ball used in these sports, see <a href="/wiki/Football_(ball)" title="Football (ball)">Football (ball)</a>.</div> 
<div class="thumb tright"> 
<div class="thumbinner" style="width:227px;"><a href="/wiki/File:Football4.png" class="image"><img alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Football4.png/225px-Football4.png" width="225" height="274" class="thumbimage" /></a> 
<div class="thumbcaption"> 
<div class="magnify"><a href="/wiki/File:Football4.png" class="internal" title="Enlarge"><img src="http://bits.wikimedia.org/skins-1.17/common/images/magnify-clip.png" width="15" height="11" alt="" /></a></div> 
Some of the many different games known as football. From top left to bottom right: <a href="/wiki/Association_football" title="Association football">Association football</a> or soccer, <a href="/wiki/Australian_rules_football" title="Australian rules football">Australian rules football</a>, <a href="/wiki/International_rules_football" title="International rules football">International rules football</a>, <a href="/wiki/Rugby_Union" title="Rugby Union" class="mw-redirect">Rugby Union</a>, <a href="/wiki/Rugby_League" title="Rugby League" class="mw-redirect">Rugby League</a>, and <a href="/wiki/American_Football" title="American Football" class="mw-redirect">American Football</a>.</div> 
</div> 
</div> 
<p>The game of <b>football</b> is any of several similar <a href="/wiki/Team_sport" title="Team sport">team sports</a>, of similar origins which involve advancing a ball into a goal area in an attempt to score. Many of these involve <a href="/wiki/Kick_(football)" title="Kick (football)">kicking</a> a ball with the foot to score a <a href="/wiki/Goal_(sport)" title="Goal (sport)">goal</a>, though not all codes of football using kicking as a primary means of advancing the ball or scoring. The most popular of these sports worldwide is <a href="/wiki/Association_football" title="Association football">association football</a>, more commonly known as just "football" or "soccer". Unqualified, the word <i><a href="/wiki/Football_(word)" title="Football (word)">football</a></i> applies to whichever form of football is the most popular in the regional context in which the word appears, including <a href="/wiki/American_football" title="American football">American football</a>, <a href="/wiki/Australian_rules_football" title="Australian rules football">Australian rules football</a>, <a href="/wiki/Canadian_football" title="Canadian football">Canadian football</a>, <a href="/wiki/Gaelic_football" title="Gaelic football">Gaelic football</a>, <a href="/wiki/Rugby_league" title="Rugby league">rugby league</a>, <a href="/wiki/Rugby_union" title="Rugby union">rugby union</a> and other related games. These variations are known as "codes".</p> 
<div class="toclimit-3"></div> 

The page's information is in a php array. How would i parse out the information located in the paragraph tags of the array. Ive tried simple HTML dom parser and this code with no luck

 if(preg_match_all("~<p>([\s\S]*?)</p>~i", $arr['parse']['text']['*'], $matches)){
  print_r($matches[1]);  //   <--- this will contain all matches found within <p> ... </p>
}

Any help would be really appreciated, im completely stumped!

DIM3NSION

  • 写回答

1条回答 默认 最新

  • dre75230 2011-05-04 01:26
    关注

    What was the issue with using DOM?

    The only problem I can see is the lack of a root element.

    Try something like this

    $doc = new DOMDocument;
    $doc->loadXML('<root>' . $arr['parse']['text']['*'] . '</root>');
    $paragraphs = $doc->getElementsByTagName('p');
    

    You could try the same with SimpleXML.

    评论

报告相同问题?

悬赏问题

  • ¥15 smptlib使用465端口发送邮件失败
  • ¥200 总是报错,能帮助用python实现程序实现高斯正反算吗?有偿
  • ¥15 对于squad数据集的基于bert模型的微调
  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 CST保存项目时失败
  • ¥15 树莓派5怎么用camera module 3啊
  • ¥20 java在应用程序里获取不到扬声器设备
  • ¥15 echarts动画效果的问题,请帮我添加一个动画。不要机器人回答。
  • ¥15 Attention is all you need 的代码运行