drcigvoy48900 2012-06-25 10:40
浏览 39
已采纳

使用simplehtmldom从网页获取指定的URL

i am trying to build simple php crawler

for this purpose

i am getting constants of webpage using http://simplehtmldom.sourceforge.net/

after getting page data i get page as bellow

include('simplehtmldom/simple_html_dom.php');
$html = file_get_html('http://www.mypage.com');
foreach($html->find('a') as $e) 
echo $e->href . '<br>';

this works perfectly,and print all links on that page.

i only want to get some url like

/view.php?view=open&id=

i have wirtten function for this purpose

function starts_text_with($s, $prefix){
    return strpos($s, $prefix) === 0;
}

and use this function as

include('simplehtmldom/simple_html_dom.php');
$html = file_get_html('http://www.mypage.com');
foreach($html->find('a') as $e) {
    if (starts_text_with($e->href, "/view.php?view=open&id=")))
    echo $e->href . '<br>';
}

but nothing return.

i hope you understand what i need.

i need to print only url which match that criteria.

Thanks

  • 写回答

1条回答 默认 最新

  • dongxin2734 2012-06-25 10:52
    关注
    include('simplehtmldom/simple_html_dom.php');
    $html = file_get_html('http://www.mypage.com');
    foreach($html->find('a') as $e) {
        if (preg_match($e->href, "view.php?view=open&id="))
             echo $e->href . '<br>';
    }
    

    try this once.

    refer preg_match

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 矩阵加法的规则是两个矩阵中对应位置的数的绝对值进行加和
  • ¥15 活动选择题。最多可以参加几个项目?
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)
  • ¥20 怎么在stm32门禁成品上增加查询记录功能
  • ¥15 Source insight编写代码后使用CCS5.2版本import之后,代码跳到注释行里面
  • ¥50 NT4.0系统 STOP:0X0000007B
  • ¥15 想问一下stata17中这段代码哪里有问题呀