doujia1988 2014-12-02 12:47
浏览 23

如何从网页上抓取数据?

I need to show some news from web page, so I need to extract data from web site. But I am unable to extract data as the following code:

$html=file_get_html("http://listverse.com/2014/12/01/10-times-us-foreign-policy-was-wildly-inconsistent/");
     foreach($html->find('article h2') as $element)
     {
        echo "<h2>".$element->plaintext."</h2>"."<br>";

        foreach ($html->find('article h2 p') as $element1) {

            echo "<pre>";print_r($element1->plaintext );
        }

But I got correct header but each paragraph is redundant.

  • 写回答

1条回答 默认 最新

  • doupu2722 2014-12-02 12:51
    关注

    The paragraphs follow the headings, they aren't descendants of them (and HTML doesn't allow paragraphs to descend from headings anyway).

    Having got the headings, you need to look at their siblings (e.g. looping over them until you get one that isn't a paragraph or is another heading).

    评论

报告相同问题?

悬赏问题

  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 划分vlan后不通了
  • ¥15 GDI处理通道视频时总是带有白色锯齿
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)
  • ¥15 自适应 AR 模型 参数估计Matlab程序
  • ¥100 角动量包络面如何用MATLAB绘制
  • ¥15 merge函数占用内存过大
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大