dongzhiman2162 2015-07-03 06:24
浏览 32
已采纳

用于提取元数据的正则表达式

I have retrieved html page using cURL, now I want to extract the specific meta content from the meta data. i.e. <meta name="ids" content="123nsdfsdfAS">.

What I did as follows:

function file_get_contents_curl($url)
{
$agent= 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0';

$ch = curl_init();

curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

$data = curl_exec($ch);
curl_close($ch);

return $data;
}

$html = file_get_contents_curl("http://example.com");

So, I want to extract a specific meta content i.e. <meta name="ids" content="123nsdfsdfAS"> from $html using preg_match_all or preg_match or related any function and regular expression. I have written a regex but that is not good, so I did not mention here.

  • 写回答

3条回答 默认 最新

  • douliu1092 2015-07-03 06:28
    关注

    Well, here it's fairly easy:

    /<meta[^>]+>/
    

    will match any meta tag.

    /<meta name="ids"[^>]+>/
    

    will match only the meta tag with the name ids.

    If you only want the content in this

    /<meta name="ids" content="([^"]+)">/
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab
  • ¥15 求lingo代码和思路
  • ¥15 公交车和无人机协同运输
  • ¥15 stm32代码移植没反应
  • ¥15 matlab基于pde算法图像修复,为什么只能对示例图像有效
  • ¥100 连续两帧图像高速减法