正则表达式以匹配元标记

Hi I want to extract the og:image content from a page source. How can I extract og:image meta tag content from source?

This is meta tag:

<meta property="og:image" content="http://www.moneycontrol.com/news_image_files/2013/s/Syrian_diesel_trucks_190.jpg" />

How can I identify the meta tag using regular expression?

This is my current function grab image url from img tags. What modification it needed to work with og:image meta tags?

function feeds_imagegrabber_scrape_images($content, $base_url, array $options = array(), &$error_log = array()) {

// Merge the default options.
$options += array(
  'expression' => '//img',
  'getsize' => TRUE,
  'max_imagesize' => 512000,
  'timeout' => 10,
  'max_redirects' => 3,
  'feeling_lucky' => 0,
);

$doc = new DOMDocument();
if (@$doc->loadXML($content) === FALSE && @$doc->loadHTML($content) === FALSE) {
  $error_log['code'] = -5;
  $error_log['error'] = "unable to parse the xml//html content";
  return FALSE;
}

$xpath = new DOMXPath($doc);
$hrefs = @$xpath->evaluate($options['expression']);//echo '<pre> HREFS : ';print_r($hrefs->length);exit;

if ($options['getsize']) {
  timer_start(__FUNCTION__);
}

$images = array();
$imagesize = 0;
for ($i = 0; $i < $hrefs->length; $i++) {
  $url = $hrefs->item($i)->getAttribute('src');
  if (!isset($url) || empty($url) || $url == '') {
    continue;
  }
  if(function_exists('encode_url')) {
    $url = encode_url($url);
  }
  $url = url_to_absolute($base_url, $url);

  if ($url == FALSE) {
    continue;
  }

  if ($options['getsize']) {
    if (($imagesize = feeds_imagegrabber_validate_download_size($url, $options['max_imagesize'], ($options['timeout'] - timer_read(__FUNCTION__) / 1000))) != -1)   {
      $images[$url] = $imagesize;
      if ($settings['feeling_lucky']) {
        break;
      }
    }
    if (($options['timeout'] - timer_read(__FUNCTION__) / 1000) <= 0) {
      $error_log['code'] = FIG_HTTP_REQUEST_TIMEOUT;
      $error_log['error'] = "timeout occured while scraping the content";
      break;
    }
  }
  else {
    $images[$url] = $imagesize;
    if ($settings['feeling_lucky']) {
      break;
    }
  }
}
echo '<pre>';print_r($images);exit;
return $images;
}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongxun3424 2014-01-14 19:34
关注
Make use of DOMDocument Class

<?php $html='<meta property="og:image" content="http://www.moneycontrol.com/news_image_files/2013/s/Syrian_diesel_trucks_190.jpg" />'; $dom = new DOMDocument; $dom->loadHTML($html); foreach ($dom->getElementsByTagName('meta') as $tag) { if ($tag->getAttribute('property') === 'og:image') { echo $tag->getAttribute('content'); } }

OUTPUT :

http://www.moneycontrol.com/news_image_files/2013/s/Syrian_diesel_trucks_190.jpg
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用正则表达式递归替换匹配标记 php
2011-05-13 01:17

回答 2 已采纳 You can't do this with regular expressions. You need to write a parser! So create a stack (an arr
正则表达式匹配所有结束HTML标记 html php
2014-04-06 15:19

回答 3 已采纳 Use this regex: </.+?> or /<\/.+?>/ That will do. Live Demo
PHP正则表达式标记匹配 php
2011-01-23 03:50

回答 4 已采纳 edit: [\s\S] will match anything that is space or not space. you may have a problem when there ar
php 匹配div正则表达式,Php正则表达式匹配div
2021-03-25 09:09

weixin_39747595的博客及其结束标记,无法正确匹配此输入：FooBar因为如果你的正则表达式是贪婪的,它将匹配最上面的两个div,如果它不合适,它将与正确的结束标记不匹配.因此,您应该使用HTML解析器.使用PHP,DOMDocument:...
如何使用mysqli正则表达式匹配哈希标记 mysql php
2014-05-18 22:07

回答 1 已采纳 firstly remove the # from the query like so: $query=str_replace('#','',$query); then query the
PHP正则表达式替换文本中的关键字，而不是锚标记内的关键字 php
2017-07-02 16:40

回答 1 已采纳 [edit]: since you are dealing with multibyte characters, the code needs to be edited a little: I
正则表达式匹配HTML标记内的文本 php
2012-05-06 16:16

回答 2 已采纳 Try this: <[^>]+>\s*\{{3}body\}{3}\s*<\/[^>]+> See it here in action: http://
php正则表达式除什么之外,正则表达式：匹配除特定模式以外的所有内容
2021-04-07 08:22

王司图的博客我需要一个能够匹配除以特定模式(特别是index.php及其后的内容，例如index.php?id=2342343)开头的字符串之外的...正则表达式：匹配所有内容，但：以特定模式开头的字符串(例如，any-也为空-不是以foo开头的字符串...
正则表达式匹配某些标记之外的所有新行字符 php python
2014-05-28 09:38

回答 2 已采纳 To me, this situation seems to be straight out of Match (or replace) a pattern except in situation
PHP正则表达式匹配HTML标记<a>之外的关键字 html php
2011-10-17 19:56

回答 4 已采纳 I managed to do what I wanted (without using Regex) by: parsing each character of my string rem
正则表达式匹配<cms：xxx />自定义html标记 php
2013-03-02 12:46

回答 1 已采纳 Make the quantifier not greedy: <cms:?([\S\s]*?)\/> here __^ and there're no needs
[PHP]常用正则表达式收集
2021-01-19 17:47

可以用来计算字符串的长度（一个双字节字符长度计2，ASCII字符计1）匹配空白行的正则表达式：\n\s*\r评注：可以用来删除空白行匹配HTML标记的正则表达式：<(\S*?)[^>]*>.*?</\1>|<.*? /
如何使用带正则表达式的PHP从嵌套标记获取内容？ php
2013-05-10 11:16

回答 3 已采纳 You can do it using a recursive pattern: $pattern = '~\[\[((?>[^[\]]++|(?R))*+)]]~'; $subject
正则表达式 linux 路径,正则表达式-linux路径匹配
2021-05-13 04:32

少吃菜多吃肉的博客如何使用正则表达式校验一个linux路径符合我们的格式要求呢？格式要求：必须'/'开头字符串只允许字母、数字、下划线正确格式如下/data//home/conf123/data/nginx_conf/错误格式如下nginx_conf//data//...
PHP正则表达式
2022-09-19 00:00

浅时光-King的博客 正则表达式可以单独使用的字符, 就是原子。正则表达式是一串字符串, 由含有特殊意义的字符组成, 可以看做一种语言, 只有在函数中使用才能发挥作用。可以使用函数处理就不要使用正则表达式, 正则表达式的效率比函数低...
没有解决我的问题, 去提问

悬赏问题

¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度
¥30 关于#r语言#的问题：如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
¥15 ETLCloud 处理json多层级问题
¥15 matlab中使用gurobi时报错
¥15 这个主板怎么能扩出一两个sata口
¥15 不是，这到底错哪儿了😭
¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么

正则表达式以匹配元标记

2条回答 默认 最新

悬赏问题

2条回答默认最新