PHP简单的HTML DOM Scrape外部URL

I'm trying to build a personal project of mine, however I'm a bit stuck when using the Simple HTML DOM class.

What I'd like to do is scrape a website and retrieve all the content, and it's inner html, that matches a certain class.

My code so far is:

    <?php
    error_reporting(E_ALL);
    include_once("simple_html_dom.php");
    //use curl to get html content
    $url = 'http://www.peopleperhour.com/freelance-seo-jobs';

    $html = file_get_html($url);

    //Get all data inside the <div class="item-list">
    foreach($html->find('div[class=item-list]') as $div) {
    //get all div's inside "item-list"
    foreach($div->find('div') as $d) {
    //get the inner HTML
    $data = $d->outertext;
    }
    }
print_r($data)
    echo "END";
    ?>

All I get with this is a blank page with "END", nothing else outputted at all.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dre26973 2013-12-09 16:15
关注
I think, you may want something like this

$url = 'http://www.peopleperhour.com/freelance-seo-jobs'; $html = file_get_html($url); foreach ($html->find('div.item-list div.item') as $div) { echo $div . '<br />'; };

This will give you something like this (if you add the proper style sheet, it'll be displayed nicely)
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

PHP简单的HTML DOM Scrape外部URL php
2013-12-09 16:03

回答 2 已采纳 I think, you may want something like this $url = 'http://www.peopleperhour.com/freelance-seo-jobs
简单的html dom总是加载默认的第一页而不是指定的url html php
2018-05-22 21:19

回答 1 已采纳 You need to html_entity_decode those links, I can see that they are getting mangled by simple-html
如何限制php dom解析结果 php
2015-11-08 14:14

回答 1 已采纳 Use a variable to keep counting the results, and break from the for loop as soon as the count is 2
html运行外部php,PHP简单的HTML DOM刮擦外部URL
2021-07-02 01:28

黄荣钦的博客 PHP简单的HTML DOM刮擦外部URL我想要做的是刮一个网站，并检索所有的内容，它是内部的HTML，匹配某个类。到目前为止我的代码是：error_reporting(E_ALL);include_once("simple_html_dom.php");//use curl to get ...
简单的HTML DOM Parser刮div html php
2014-11-11 01:18

回答 1 已采纳 You could try to check the sequences by using a loop (foreach). Check if the div has an image clas
简单的Html Dom刮痧一半的页面 php
2018-10-02 00:42

回答 3 已采纳 Here's my super dirty approach to fetching the rank/artist/title/youtube data using both DOMDocume
在PHP中为内容刮取DOMDocument表 php
2015-11-12 22:30

回答 1 已采纳 Can this be of any help? $table = $dom->getElementsByTagName('table')->item(1); foreach ($t
word中将空格替换为_以编程方式在网页中将Microsoft Word文档显示为图像
2020-07-18 22:42

cunchi8090的博客 Scrape docs.google.com for the docid parameter) Download the PHP Simple HTML DOM Parser here: http://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.5/simplehtmldom_1_5.zip/download 在...
用PHP刮取页面 php
2019-01-08 10:14

回答 1 已采纳 A very quick look at the page https://www.soccerstats.com/matches.asp showed that what the "cookie
使用PHP从url.jsonp获取文本 php
2017-09-16 18:47

回答 1 已采纳 The response is gzip'd. You can see it in the response headers: Content-Encoding: gzip So, you
如何在PHP中使用DOMDocument节点的特殊字符setAttribute？ php
2016-11-09 21:33

回答 2 已采纳 I don't know of any way to avoid this percent encoding. One way to solve this, could be to replac
JavaScript动态页面的爬取「爬虫」
2023-07-22 22:51

Xiunneg的博客表示不自动关闭浏览器窗口 browser = webdriver.Chrome(options=option) # 创建一个Chrome浏览器的实例，并使用设置的选项进行配置 url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable' # ...
正则表达式选择特定的html元素[Curl / PHP] html php
2018-05-21 19:54

回答 1 已采纳 do not parse HTML with regex. use a proper HTML parser instead, like DOMDocument. $domd = @DOMDo
大规模 Web 数据采集的终极开源方案 - PulsarRPA
2022-10-02 09:32

PlatonAI的博客 Kotlin val document = session.loadDocument(url, "-expires 1d") val price = document.selectFirst('.price').text() 连续采集在 PulsarRPA 中抓取大量 url 集合或运行连续采集非常简单。 Kotlin fun main() { ...
Selenium的使用
2023-12-26 00:09

Jared Chen的博客 Selenium 的使用利用Ajax的接口，找到其规律，可以通过... from selenium import webdriver import time from selenium.common.exceptions import NoSuchElementException browser = webdriver.Chrome() url =...
【爬虫】7.1. JavaScript动态渲染界面爬取-Selenium
2023-08-28 16:25

金渐层大战哥斯拉的博客不过JavaScript动态渲染的界面不止Ajax一种，而且在实际中Ajax接口中会含有很多加密参数，比如说xhr中request url的链接中含有token参数使我们难以找到规律，所以很难直接通过分析Ajax爬取数据。
爬虫数据采集基础
2022-06-28 23:20

m0_46427459的博客 VPN原理请求请求，由客户端向服务端发出，可以分为 4 部分内容：请求方法（Request Method）、请求的网址（Request URL）、请求头（Request Headers）、请求体（Request Body）。 window浏览器按F12可以进入调试...
JavaScript 逆向 ( 一 ) --- JavaScript 语法基础、逆向技巧
2021-05-11 18:38

擒贼先擒王的博客 } 由于 JavaScript 的函数可以嵌套，此时，内部函数可以访问外部函数定义的变量，反过来则不行： 'use strict'; function foo() { var x = 1; function bar() { var y = x + 1; // bar可以访问foo的变量x! } var z ...
爬虫学习笔记，从基础到部署。
2020-10-27 16:00

猿胖子的博客爬虫基础知识： ...域名：URL–>URI包含URL的。 2.web页面的构成： html(骨架),CSS（皮肤）,js（肌肉） name、status、type、size、time 3.请求方法get和post 区别：get有一个http的限制，url的长度不能超过1
经典动态渲染工具 Selenium 的使用
2022-03-15 19:28

代码输入中...的博客但是在很多情况下，一些 Ajax 请求的接口通常会包含加密参数，如token、sign等，如：https://spa2.scrape.center/，它的Ajax 接口是包含一个token参数的，如图所示。由于请求接口时必须加上token参数，所以我们...
没有解决我的问题, 去提问

悬赏问题

¥15 拟通过pc下指令到安卓系统，如果追求响应速度，尽可能无延迟，是不是用安卓模拟器会优于实体的安卓手机？如果是，可以快多少毫秒？
¥20 神经网络Sequential name=sequential, built=False
¥16 Qphython 用xlrd读取excel报错
¥15 单片机学习顺序问题！！
¥15 ikuai客户端多拨vpn，重启总是有个别重拨不上
¥20 关于#anlogic#sdram#的问题，如何解决？(关键词-performance)
¥15 相敏解调 matlab
¥15 求lingo代码和思路
¥15 公交车和无人机协同运输
¥15 stm32代码移植没反应

PHP简单的HTML DOM Scrape外部URL

2条回答 默认 最新

悬赏问题

2条回答默认最新