dru5089 2019-06-03 14:19 采纳率: 0%
浏览 99

PHP从网站获取某些信息但来自所有页面

I want to extract a href attribute but this attributes especially has mailto function. and i want to do this not just for one link but all links belongs to main webpage.

I tried this:

<?php

$url = "https://www.omurcanozcan.com";

$html = file_get_contents( $url);

libxml_use_internal_errors( true);
$doc = new DOMDocument;
$doc->loadHTML( $html);
$xpath = new DOMXpath( $doc);
$node = $xpath->query( "//a[@href='mailto:']")->item(0);


echo $node->textContent; // This will print **GET THIS TEXT**

 ?>

I expect for instance a code is

<a href='mailto:omurcan@omurcanozcan.com'>omurcan@omurcanozcan.com</a>

I want to echo

<p>omurcan@omurcanozcan.com</p>
  • 写回答

1条回答 默认 最新

  • duanlu1950 2019-06-03 14:31
    关注

    The main problem is that in your XPath, you are checking for

    //a[@href='mailto:']
    

    This will looks for a href attribute which only contains mailto:, what you want is where the href starts with mailto:, you can do this using starts-with()...

    $node = $xpath->query( "//a[starts-with(@href,'mailto:')]")->item(0);
    

    The second thing is that I don't think your page is fully loaded when you get the content, a common test I do is to save the HTML once I've loaded it so I can check it out first...

    $url = "https://www.omurcanozcan.com";
    
    $html = file_get_contents( $url);
    file_put_contents("a.html", $html);
    

    If you then look in a.html you can see the HTML it is using, in the content I cannot see any mailto: links.

    评论

报告相同问题?

悬赏问题

  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料