使用PHP Simple HTML DOM Parser从html中提取dom元素

I'm trying to extract links to the articles including the text, from this site using PHP Simple HTML DOM PARSER.

I want to extract all h2 tags for articles in the main page and I'm trying to do it this way:

    $html = file_get_html('http://www.winbeta.org');
    $articles = $html->getElementsByTagName('article');
    $a = null;

    foreach ($articles->find('h2') as $header) {
                $a[] = $header;
    }

    print_r($a);

according to the manual it should first get all the content inside article tags then for each article extract the h2 and save in array. but instead it gives me :

EDIT

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douzhi1924 2016-01-05 20:32
关注
There are several problems:

getElementsByTagName apparently returns a single node, not an array, so it would not work if you have more than one article tag on the page. Instead use find which does return an array;

But once you make that switch, you cannot use find on a result of find, so you should do that on each individual matched article tag, or better use a combined selector as argument to find;

Main issue: You must retrieve the text content of the node explicitly with ->plaintext, otherwise you get the object representation of the node, with all its attributes and internals;

Some of the text contains HTML entities like ’. These can be decoded with html_entity_decode.

So this code should work:

$a = array(); foreach ($html->find('article h2') as $h2) { // any h2 within article $a[] = html_entity_decode($h2->plaintext); }

Using array_map, you could also do it like this:

$a = array_map(function ($h2) { return html_entity_decode($h2->plaintext); }, $html->find('article h2'));

If you need to retrieve other tags within articles as well, to store their texts in different arrays, then you could do as follows:

$a = array(); $b = array(); foreach ($html->find('article') as $article) { foreach ($article->find('h2') as $h2) { $a[] = html_entity_decode($h2->plaintext); } foreach ($article->find('h3') as $h3) { $b[] = html_entity_decode($h3->plaintext); } }
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用PHP Simple HTML DOM Parser从html中提取dom元素 html php
2016-01-05 19:48

回答 1 已采纳 There are several problems: getElementsByTagName apparently returns a single node, not an array,
使用PHP Simple HTML DOM Parser提取HTML纯 html php
2016-09-25 15:51

回答 1 已采纳 $escapedHtmlChars = ""; $htmlElements = ""; $html = file_get_html('https://my.playstation.com/obai
使用Simple HTML DOM Parser从HTML中提取数据 php
2013-11-07 18:42

回答 1 已采纳 The raw source code is different, that's why you're not getting the expected results... You can c
PHP Simple HTML DOM Parser 中文手册
2013-05-19 22:27

simple_html_dom中文解析手册
使用php Simple HTML DOM Parser php
2018-11-06 13:51

回答 2 已采纳 The animal names are in the attributes, you can use getAttribute: $html = file_get_html('zoo.xml'
如何在Simple HTML Dom Parser中处理http错误 html php
2017-01-08 21:29

回答 1 已采纳 Nevermind, I feel really stupid now. All I had to do was if($r1pro){ <--do normal stuff if no
我怎么找到这个div？（PHP Simple HTML DOM Parser） html php
2017-12-14 13:13

回答 2 已采纳 I made a change to your code where I am searching for the class: <?php include('simple_htm
php解析html类库simple_html_dom(详细介绍)
2020-10-27 03:09

一直以来使用php解析html文档树都是一个难题。Simple HTML DOM parser 帮我们很好地解决了这个问题。可以通过这个php类来解析html文档，对其中的html元素进行操作 (PHP5+以上版本)
Simple Dom Parser - 从结果中剥离链接和特定div html php
2014-04-11 12:24

回答 1 已采纳 use outertext = : $div->outertext = ''; $a->outertext = $a->text();
使用Simple HTML DOM Parser检索值 php
2014-06-27 09:48

回答 1 已采纳 Assuming the DOM is in $dom: $value = $dom->find("td.tabData a", 0)->plainText
我可以使用SIMPLE HTML DOM PARSER解析php吗？ html php
2011-01-29 10:00

回答 2 已采纳 If it's just a PHP file with a .php ending, you can parse it no problem. The file extension doesn'
simple html dom 属性,PHP Simple HTML DOM Parser
2021-06-12 05:52

迷茫的新客的博客 I am very happy to announce the second release candidate for the next major version of simplehtmldom. It brings very important bug fixes, performance improvements and a few new features.Important: Thi...
如何使用PHP Simple HTML DOM Parser查找非超链接文本 php
2011-07-19 07:37

回答 2 已采纳 I hope I'm not misunderstanding the question, but can't you use the built-in DOM functions for PHP
html-parser：php html解析器，类似与PHP Simple HTML DOM Parser，但是比它快好几倍
2021-02-03 19:28

HtmlParser php html解析工具，类似与PHP Simple HTML DOM Parser。由于基于php模块dom，所以在解析html时的效率比PHP Simple HTML DOM Parser快好几倍。注意：html代码必须是utf-8编码字符，如果不是请转成utf-8...
simple HTML dom 异常,关于PHP Simple HTML DOM Parser的异常处理
2021-07-02 01:48

徐梅栋的博客 1.关于PHP Simple HTML DOM Parser加载大页面报错加载大页面(比如:http://www.ebates.com/stores/all/index.htm)时，你调用其中的find的方法，报的错误信息是：Get Cssh back!PHP Fatal error: Call to a member ...
simplehtmldom:这是简单HTML DOM解析器的镜像，网址为
2021-05-07 04:30

simplehtmldom是用于PHP的快速可靠HTML DOM解析器。主要特征纯粹基于PHP的DOM解析器（无需XML扩展名）。适用于格式正确且已损坏HTML文档。加载网页，本地文件和文档字符串。支持CSS选择器。要求 simple...
PHP解析 Simple HTML DOM Parser类
2014-12-05 11:30

高效快速分析和获取HTML内容，对抓取过来的内容进行分析和特定内容提取很方便
探索高效解析HTML的新境界：PHP Simple HTML DOM Parser深度剖析
2024-06-03 10:05

蒋素萍Marilyn的博客探索高效解析HTML的...在当今的互联网时代，处理网页数据已成为开发中的常规操作，而PHP Simple HTML DOM Parser正是为此量身打造的一把利器。本文将带你深入了解这一神器，从其核心技术到实际应用，展现其独特魅...
simplehtmldom.php 下载,simplehtmldom_1_9_1 PHP Simple HTML DOM Parser - 下载 - 搜珍网
2021-04-19 04:04

途大帅的博客文件名大小更新时间CHANGELOG.md111832019-10-20LICENSE10952019-10-20example02019-10-20example\example_advanced_selector....
php中html解析器,PHP Simple HTML DOM解析器
2021-04-22 09:36

AshdollR的博客 Simple HTML DOM parser帮我们很好地解决了使用 php html 解析问题。可以通过这个php类来解析html文档，对其中的html元素进行操作 (PHP5+以上版本)。解析器不仅仅只是帮助我们验证html文档；更能解析不符合W3C标准...
没有解决我的问题, 去提问

悬赏问题

¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同
¥50 如何openEuler 22.03上安装配置drbd
¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
¥15 无线连接树莓派，无法执行update，如何解决？（相关搜索：软件下载）
¥15 Windows11, backspace, enter, space键失灵

使用PHP Simple HTML DOM Parser从html中提取dom元素

1条回答 默认 最新

悬赏问题

1条回答默认最新