Symfony DomCrawler空对象

I'm trying to scrape the rating score of review sites, using Laravel 4 and the Symfony DomCrawler. Let's take this site as an example: http://estorereview.com.au/s/5951/A-Supplements I want to get the 4.8 of 5 Stars

This is partial code of my attempt:

<?php

use Symfony\Component\DomCrawler\Crawler;
use Symfony\Component\CssSelector\CssSelector;

function getRatingEstoreReview($url){
  $html = getHtmlCurl($url);
  $crawler = new Crawler($html);
  $crawler = $crawler->filter('span[itemprop="ratingValue"]'); 
  var_dump($crawler);
  die("test");
  return normalize($crawler,5);
}

The var_dump returns following:

object(Symfony\Component\DomCrawler\Crawler)[280]
  protected 'uri' => null
  private 'defaultNamespacePrefix' => string 'default' (length=7)
  private 'namespaces' => 
    array (size=0)
      empty

I tried this with other sites etc. but I'm always getting an empty object. Accessing the value with $crawler->first doesn't work as well.

What am I doing wrong? Thank you.

Edit: Even if I'm filtering for "div" the Crawler remains empty. The PHP Simple HTML DOM Parser works fine

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongmaopan5738 2014-09-16 12:09
关注
The full CSS path for that element is body > div:nth-child(3) > div > div > div.left-container.floatl > div.top > div.top-inner > div.store-rating-container.floatl > div.star-col.floatl.overall-rating-stars > div.rating-text.floatl > div > strong > span. Have you tried using that as your filter term instead?

You can also use filterXPath() instead, in which case you're looking for /html/body/div[3]/div/div/div[4]/div[1]/div[2]/div[2]/div[1]/div[2]/div/strong/span.

Edit: it doesn't look like it applies to this specific page, but just wanted to mention a "gotcha" for web crawling. Remember that for some web pages, the contents will have been manipulated (post-load) by JavaScript. In that case, the elements you're looking for may not be seen by DomCrawler at all.

Update:

Here are the results I see. I'm using Goutte rather than getHtmlCurl().

Code:

use Goutte\Client; use Symfony\Component\DomCrawler\Crawler; $client = new Client(); $crawler = $client->request('GET', 'http://estorereview.com.au/s/5951/A-Supplements'); var_dump($crawler->filter('span[itemprop="ratingValue"]')); echo $crawler->filter('span[itemprop="ratingValue"]')->text(); die("<br />test completed");

Output:

object(Symfony\Component\DomCrawler\Crawler)[177] protected 'uri' => string 'http://estorereview.com.au/s/5951/A-Supplements' (length=47) private 'defaultNamespacePrefix' => string 'default' (length=7) private 'namespaces' => array (size=0) empty 4.8 test completed

So, that works.
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用Symfony DomCrawler在选择输入中选择不可能的值 php symfony
2014-04-05 17:39

回答 2 已采纳 The disableValidation function allow to select impossible values in options : $form = $crawler-&
请求对象为空 - Symfony 3.4 php symfony
2018-07-20 12:19

回答 1 已采纳 It seems that you don't have include the Request class in your file header <?php namespace Ap
PHP Symfony多对多形式 php symfony
2018-05-29 16:30

回答 1 已采纳 You don'e need to add an intermediate entity (bestelregel here), unless it has to have its own uni
php domcrawler,php中symfony/dom-crawler使用解释说明
2021-04-24 14:19

遇见sher的博客话不多说，因为近期需要做一个项目采集的工作，因此通过composer了解到symfony/dom-crawlersymfony/css-selector这两个配套使用抓取dom非常的不错，因此今天有幸来使用，并记录下来使用过程好下面先看看如何使用吧...
PHP symfony路由问题 apache php symfony
2016-09-28 17:06

回答 1 已采纳 As Cerad says, you need to set your DocumentRoot directive to point at C:\xampp\htdocs\company\PHP
symfony2 php访问另一个实体的对象 php symfony
2016-02-16 05:39

回答 1 已采纳 The getContentSelector() method return an ArrayCollection so you need to access to an element of t
Symfony教程中的PHP语法 php symfony
2017-04-30 18:06

回答 2 已采纳 This is in fact method chaining. The new lines between separate method calls is only for better co
Symfony DomCrawler库
2023-11-06 10:41

qq^^614136809的博客 Symfony DomCrawler库是Symfony框架中的一个组件，用于解析HTML或XML文档，并提供了一种方便的方式来查询和操作文档中的元素。加载HTML或XML文档，并创建一个Crawler对象。遍历和操作查询到的元素，如获取元素的属性...
Symfony4 findAll在服务中返回空数组 php symfony
2019-03-08 13:10

回答 1 已采纳 Did you check your Entity and verify that this header is present : /** * @ORM\Entity(repository
symfony中PHP文件的路径 javascript php symfony
2016-06-24 09:37

回答 2 已采纳 Your server does not allow anything else as app.php or app_dev.php. For example if your server is
Symfony 3表单不返回对象 php symfony
2017-09-04 18:28

回答 1 已采纳 When you create your form in the Controller you should pass a new Entity in it like this: $etatSu
php domcrawler,在laravel中使用Symfony的Crawler组件分析HTML
2021-04-24 14:18

weixin_39602280的博客 Crawler全名是DomCrawler，是Symfony框架的组件。令人发指的是DomCrawler的没有中文文档，Symfony也没有翻译该部分，所以使用DomCrawler开发只能一点一点摸索，现将使用过程中的经验总结。首先是安装composer ...
php symfony错误mongoDB mongodb php symfony
2017-03-21 19:19

回答 1 已采纳 Using ODM with PHP 7 is a bit trickier than it should, please check if adding "provide": { "e
《PHP挖宝》2—Symfony包介绍
2020-10-08 13:37

玄钺斫峰的博客文章目录Hello World示例Symfony使用的包部分Symfony包介绍《PHP挖宝》专栏入口地址 Symfony官网：https://symfony.com/ 我迫不及待地想向大家介绍Symfony这个框架以及它的众多组件。原因在序章里进行了简单...
php domcrawler,php – Goutte – dom crawler – 删除节点
2021-04-24 14:18

weixin_39819576的博客 clickbackclickback我想收到：clickclick$client = new Client();...request('GET', 'http://testsite.com/test.php');$crawler->filter('.first .second')->each(function ($node) {//??????}...
php domcrawler,在laravel使用Crawler组件对HTML进行分析
2021-04-24 14:18

weixin_39702479的博客这篇文章主要介绍了在laravel中...令人发指的是DomCrawler的没有中文文档，Symfony也没有翻译该部分，所以使用DomCrawler开发只能一点一点摸索，现将使用过程中的经验总结。首先是安装composer require symfony/dom...
composer类库-HTML分析组件DomCrawler
2018-04-13 14:53

jet_wong的博客最近用php进行爬虫学习，用composer安装了一个类库 symfony/dom-crawler，用来分析抓取到的网页html元素，提取其中想要的内容。因其没有中文文档，也很少有使用这个类库的相关中文资料，所以使用过程中也遇到了一些...
php抓取dom处理后数据,写爬虫时PHP解析HTML最高效的方法那就是用DomCrawler!
2021-03-26 10:03

学徒MJ的博客需求来源,需要用PHP解析HTML提取我想要的数据用PHP写网站爬虫的时候,需要把爬取的网页进行解析,提取里面想要的数据,这个过程叫做网页HTML中数据结构化。很多人应该知道用phpQuery像JQuery一样的语法进行网页处理,...
说PHP不适合做爬虫的人，看这里
2023-04-20 23:35

黑夜开发者的博客它使用了 Guzzle HTTP客户端库和Symfony DomCrawler组件，能够模拟用户访问网站，获取网页的内容，并执行抓取任务。Goutte是一款优秀的PHP爬虫框架，具有简单易用、兼容性强、灵活性高、集成度高等优点。通过以上...
没有解决我的问题, 去提问

悬赏问题

¥15 下图接收小电路，谁知道原理
¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度
¥30 关于#r语言#的问题：如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
¥15 ETLCloud 处理json多层级问题
¥15 matlab中使用gurobi时报错
¥15 这个主板怎么能扩出一两个sata口
¥15 不是，这到底错哪儿了😭
¥15 2020长安杯与连接网探

Symfony DomCrawler空对象

1条回答 默认 最新

悬赏问题

1条回答默认最新