drelgkxl93433 2014-04-16 17:12
浏览 34
已采纳

正则表达式没有在Google搜索结果中找到所有值?

First of all, I should stress that I'm trying to learn here, not be malicious or spam anyone.

I'm trying to learn about regex in Google search results by finding email addresses using the following code. However, sometimes it only finds some of the email addresses, other times not at all.

If I try it with a Wikipedia URL then I don't have a problem.

$url = "https://www.google.com/search?q=hello@hotmail.com";
// $url = "http://en.wikipedia.org/wiki/Email_address"; this works fine
$string = file_get_contents($url);

$matches = array();
$pattern = '/[a-z\d._%+-]+@[a-z\d.-]+\.[a-z]{2,4}\b/i';
preg_match_all($pattern,$string,$matches);

foreach ($matches as $row)
{
    foreach ($row as $row2)
    {
        echo $row2."<br>";
    }
}
  • 写回答

1条回答 默认 最新

  • dqmgjp5930 2014-04-16 17:22
    关注

    You're missing uppercase:

    '/[A-Za-z\d._%+-]+@[A-Za-z\d.-]+\.[A-Za-z]{2,4}\b/i'
    

    I put it in everywhere in case you want to match HELLO@GMAIL.COM, you can always downcase it.

    EDIT: I think I was trying to solve this for a different email address which wasn't being matched

    EDIT 2: search the html, those that don't work have emphasis like example<em>@example.com</em> so won't parse.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 正弦信号发生器串并联电路电阻无法保持同步怎么办
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 个人网站被恶意大量访问,怎么办
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM
  • ¥15 划分vlan后不通了
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)