如何使用PHP从HTML文档中仅提取某些标签？

I'm using a crawler to retrieve the HTML content of certain pages on the web. I currently have the entire HTML stored in a single PHP variable:

$string = "<PRE>".htmlspecialchars($crawler->results)."</PRE>
";

What I want to do is select all "p" tags (for example) and store their in an array. What is the proper way to do that?

I've tried the following, by using xpath, but it doesn't show anything (most probably because the document itself isn't an XML, I just copy-pasted the example given in its documentation).

$xml = new SimpleXMLElement ($string);

    $result=$xml->xpath('/p');
    while(list( , $node)=each($result)){
        echo '/p: ' , $node, "
"; 
    }

Hopefully someone with (a lot) more experience in PHP will be able to help me out :D

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douxu5233 2012-03-27 21:56
关注
Check out Simple HTML Dom. It will grab external pages and process them with fairly accurate detail.

http://simplehtmldom.sourceforge.net/

It can be used like this:

// Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find all images foreach($html->find('img') as $element) echo $element->src . '<br>';
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(2条)

报告相同问题？

关注问题

如何使用PHP从HTML文档中仅提取某些标签？ php
2012-03-27 20:10

回答 3 已采纳 Check out Simple HTML Dom. It will grab external pages and process them with fairly accurate detai
使用PHP从XML文档中的属性中提取信息 php xml
2014-03-20 13:51

回答 4 已采纳 simplexml_load_string() makes working with XML easy for basic tasks like this: <?php $event= s
如何在PHP中从XML（快速信息文档）中获取所有子标签 php xml
2018-02-16 07:10

回答 1 已采纳 The problem is caused by simple_html_dom trying to correct your XML. There are a few issues with
php如何提取json类型的数据,我如何从PHP中提取JSON数据？
2021-04-22 04:58

weixin_39734020的博客介绍首先你有一个string。... 使用json_decode()在PHP中对其进行解码。$data = json_decode($json);其中你可能会发现：标量： string ，整数，浮点数和布尔值空值 (它自己的特殊types)复合types：对象...
PHP-如何在HTML文档中搜索并在php中提取某些字符串？ php
2011-01-06 22:34

回答 2 已采纳 Using ^ in a regex means that it will only match if the entire line begins with your subject. Also
从html文档中提取特定部分，php cURL，php，preg_match php
2010-05-04 18:41

回答 4 已采纳 A very basic example would be highly appreciated To answer the regex part: preg_match('!&l
html提取页眉和页脚来分隔文档 html php
2015-06-21 09:44

回答 2 已采纳 There isn't really a good way doing it in pure HTML, since iframe or an ajax-request using Javascr
html语言div怎么使用,什么是div标签？HTML中div标签怎么使用？
2021-06-11 12:27

舒琪学姐的博客 HTML中的div标签是实现网页的重要基础，是学习HTML知识必不可少的内容，本篇文章就来为大家介绍关于HTML中div标签的使用方法。什么是div标签？div标签表示一组必要的结构。目的是将夹在div标签之间的字符分成块状。...
从文档中提取特定的<a href> URL php
2010-07-20 07:37

回答 2 已采纳 Try something like: preg_match_all('/http:\/\/images\.examplesite\.com\/images\/(.*?)"/i', $html_d
通过阅读函数注释自动创建文档（PHP） php
2014-09-27 11:08

回答 1 已采纳 You can't get access to comments from PHP it the target file is included with include, include_onc
批量提取txt前七个字符 java php python 有问必答
2021-08-07 17:01

回答 3 已采纳 import os import openpyxl path = r"E:/xxx" #目录路径 FileNames=os.listdir(path) li = [["文件名","车牌号"]] fo
[PHP]取html所有img标签的src属性值
2018-07-05 11:46

Balmunc的博客 preg_match_all('/<img[^>]*?src="([^"]*?)"[^>]*?>/i',$str,$match); echo $match[1];
从多维数组中提取和分组数据 php
2014-10-11 13:07

回答 3 已采纳 The trick to manipulating arrays is to arranging them into data structures that will make it easy
使用php进行文章关键字(标签)的提取
2018-11-01 15:40

づ奈何ā的博客对于这个问题，还真是搜索了大量的资料，网上就没有个现成的好的文档进行说明一下，该如何弄。没办法自己搞吧。黄天不负有心人，经过一下午的搜索查找，筛选，终于找到一个纯php实现的中文切分工具。 ...
如何在HTML文档中调用Python程序？
2021-10-09 23:37

Admin_ghj的博客 html页面中确实能够调用python程序，不过只能调“一点点”,在html中运行python程序有许多限制，首先必须要使用ie浏览器（它是通过一个叫Activex的控件来调用的，通过Activex控件可以在命令行中执行命令，如果把...
没有解决我的问题, 去提问

悬赏问题

¥15 我想在一个软件里添加一个优惠弹窗，应该怎么写代码
¥15 fluent的在模拟压强时使用希望得到一些建议
¥15 STM32驱动继电器
¥15 Windows server update services
¥15 关于#c语言#的问题：我现在在做一个墨水屏设计，2.9英寸的小屏怎么换4.2英寸大屏
¥15 模糊pid与pid仿真结果几乎一样
¥15 java的GUI的运用
¥15 Web.config连不上数据库
¥15 我想付费需要AKM公司DSP开发资料及相关开发。
¥15 怎么配置广告联盟瀑布流

如何使用PHP从HTML文档中仅提取某些标签？

3条回答 默认 最新

悬赏问题

3条回答默认最新