从Web抓取信息

How to get informations (http://linkWeb.com, Titles, and http://link.pdf) from this html page ?

<div class="title-download">
    <div id="01divTitle" class="title">
        <h3>
            <a id="01Title" onmousedown="" href="http://linkWeb.com">Titles</a>
            <span id="01LbCitation" class="citation">(<a id="01Citation" href="http://citation.com">Citations</a>)</span></h3>
    </div>
    <div id="01downloadDiv" class="download">
        <a id="01_downloadIcon" title="http://link.pdf" onmousedown="" target=""><img id="ctl01_icon" class="small-icon";" /></a>
    </div>
</div>

I've trying but it only returns the title. I'm not aware wth simple_tml_dom before. please help me. thank you :)

<?php

include 'simple_html_dom.php';
set_time_limit(0);

$url  ='http://libra.msra.cn/Search?query=data%20mining&s=0';
$html = file_get_html($url) or die ('invalid url');
foreach($html->find('div[class=title-download]') as $webLink){
    echo $webLink->plaintext.'<br>';
    echo $webLink->href.'<br>';
}

foreach($html->find('div[class=download]') as $Link2){
    echo $webLink2->href.'<br>';    
}

?>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
du8980919 2012-07-22 01:20
关注
Scrap the titles and urls with this code :

foreach($html->find('span[class=citation]') as $link){ $link = $link->prev_sibling(); echo $link->plaintext.'<br>'; echo $link->href.'<br>'; }

and to scrap the url in class download, using the answer given by @zigomir :)

foreach($html->find('.download a') as $link){ echo $link->title.'<br>'; }
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(2条)

报告相同问题？

关注问题

使用XPATH和JSON进行php web抓取 php
2014-06-04 23:55

回答 1 已采纳 If you want to search JSON, you should use JSONPath, not XPath: <?php require_once('json.p
python如何抓取类型为EventStream的数据 php python 有问必答
2023-02-13 09:35

回答 4 已采纳使用stream参数和iter_content方法 s="" resp=requests.get(url,stream=True) print(resp.headers) for chunk in r
php的curl抓取数据和js的ajax 比较问题 ajax php
2018-04-10 09:40

回答 4 已采纳像你说的按道理应该就是两个服务器了如果是同个服务器大可直接调用函数, php curl 是消耗本地性能浏览器 ajax 是消耗客户端性能如果数据不多效果是没那么明显的. 看你前端怎么
web-scraping:Web抓取
2021-05-25 09:29

Web抓取，又称为网页抓取或网络抓取，是一种自动化技术，用于从互联网上收集大量数据。在本项目"web-scraping"中，使用了PHP编程语言和Symfony框架进行实现。Symfony是一个强大的、灵活的PHP框架，用于构建高质量的...
Web抓取来自3gpp网站的html表的链接和日期 html php
2017-02-12 16:50

回答 1 已采纳 If your literally just trying to grab the date(00-00-0000) and zip url from the page given, you co
从多维数组中的第3级抓取所有值 json php
2017-06-07 13:52

回答 1 已采纳 Assuming you're taking your JSON, running it through json_decode($json) and then using foreach on
JavaScript函数同步抓取HTML和JS javascript php
2016-07-23 16:39

回答 1 已采纳 There is no Javascript environment where it is recommended to use synchronous networking to retrie
Winform实现抓取web页面内容的方法
2020-09-04 03:52

以上就是使用WinForm抓取Web页面内容的基本步骤。然而，这只是第一步。实际的网页抓取可能需要处理更复杂的情况，如登录验证、cookies管理、HTML解析（可以使用如HtmlAgilityPack这样的库），甚至可能需要多线程或者...
在PHP中抓取Javascript生成的内容[重复] javascript php
2014-07-12 17:46

回答 2 已采纳 It's probably impossible to do that on a shared hosting, fortunately my provider installed Phantom
如何从网站上抓取所有内容？ [关闭] html php windows
2011-04-25 14:55

回答 8 已采纳 htttrack will work just fine for you. It is an offline browser that will pull down websites. You
我的PHP变量保持echo相同的信息 - 而不是URL中的新数据来抓取 jquery php
2013-02-25 04:22

回答 1 已采纳 I have figured the answer to this, the problem was that the CMS Magento likes to cache certain blo
Seer:一个基于PHP XPath的Web抓取框架
2021-05-20 20:39

SEER 一个基于PHP XPath的Web抓取框架。安装使用Git $ git clone https://github.com/Omarito2412/Seer.git 使用作曲家{ " require " :{ " seer/seer " : " dev-master " }} 或手动下载用法只需提供Seer.php并开始您...
纯 PHP 开发的并行抓取工具 (Parallel web crawler written in PHP)
2024-05-31 19:20

纯 PHP 开发的并行抓取工具 (Parallel web crawler written in PHP) 这是最近使用纯 php 代码开发的并行抓取(爬虫)框架，基于 hightman\httpclient 组件。您必须先装有 composer，然后在项目里先运行以下命令下载...
WebScrapping:使用OOPs Way抓取网站PHP脚本
2021-05-19 18:16

Web报废一个PHP脚本，用于以面向对象的方式抓取网站，主要使用的技术是CURL。下面的示例显示了刮刮Amazon的分步说明示例：报废亚马逊产品包括课程 require_once('class.scrape.php'); 通过提供您要抓取的网址来...
基于php实现七牛抓取远程图片
2020-10-23 05:53

`Qiniu_RS_Stat`用于获取文件元信息，`Qiniu_RS_Fetch`用于从指定URL抓取文件并存储到七牛云。 6. **文件抓取与上传**：`oneFetch` 函数实现了单个文件的抓取和上传。首先，它调用`url_exists`函数判断七牛云中是否...
没有解决我的问题, 去提问

悬赏问题

¥15 单纯型python实现编译报错
¥15 c++2013读写oracle
¥15 c++ gmssl sm2验签demo
¥15 关于模的完全剩余系(关键词-数学方法)
¥15 有没有人懂这个博图程序怎么写，还要跟SFB连接，真的不会，求帮助
¥15 PVE8.2.7无法成功使用a5000的vGPU，什么原因
¥15 is not in the mmseg::model registry。报错，模型注册表找不到自定义模块。
¥15 安装quartus II18.1时弹出此error，怎么解决？
¥15 keil官网下载psn序列号在哪
¥15 想用adb命令做一个通话软件，播放录音

从Web抓取信息

3条回答 默认 最新

悬赏问题

3条回答默认最新