preg_match帮助互斥条件匹配

I've got two options for this plugin.

(1) nofollow all external links in content

and/or

(2) no follow links to this target folder (enter the absolute url to the target folder)

In option 2, the links could be internal OR external.

Both options can be set, neither option may be set, or a single option may be set.

if(get_option('my_nofollow') || get_option('my_nofollow_folder')){add_filter('wp_insert_post_data', 'save_my_nofollow' );}

So I'm setting a filter when either of these options are set, to the function below. My question is, how do I alter the function so that if (2) is set but not (1) I only add nofollow to links matching the target folder URL?

function save_my_nofollow($content) {
$my_folder =  get_option('my_nofollow_folder');
$matches = array();
    preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
    for ( $i = 0; $i <= sizeof($matches[0]); $i++){
        if ( isset($matches[0][$i]) && (preg_match('~' . $my_folder . '~', $matches[0][$i]) 
               || !preg_match( '~'.get_bloginfo('url').'~',$matches[0][$i]))){
        $result = trim($matches[0][$i],">");
        $result .= ' rel="nofollow">';
        $content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
        }
    }
    return $content;
}

UPDATED code with best answer:

if(get_option('rseo_nofollow') 
    || get_option('rseo_nofollow_folder')){
    add_filter('wp_insert_post_data', 'save_rseo_nofollow' );
    }

function save_rseo_nofollow($content) {
    $folder =  get_option('rseo_nofollow_folder');
    $externalNoFollow = get_option('rseo_nofollow_external');
    $folderNoFollow = get_option('rseo_nofollow_folder');
    $extRegex = '~'.preg_quote(get_bloginfo('url'), '~') . '~i';
    $intRegex = '~'.preg_quote($folder, '~') . '~i';

    $dom = new DomDocument();
    libxml_use_internal_errors(true);
    $dom->loadXml('<root>' . $content['post_content'] . '</root>');

    $links = $dom->getElementsByTagName('a');
    foreach ($links as $link) {
        $href = $link->getAttribute('href');
        if ($href && $externalNoFollow && !preg_match($extRegex, $href)) {
            $link->setAttribute('rel', 'nofollow');
        } elseif ($href && $folderNoFollow && preg_match($intRegex, $href)) {
            $link->setAttribute('rel', 'nofollow');
        }
    }
//  print $dom->saveXml();die;
    //Since we want to strip the root element, we must do so:
    $newContent = '';
    $root = $dom->getElementsByTagName('root')->item(0);
    foreach ($root->childNodes as $child) {
        $newContent .= $dom->saveXml($child);
    }
    $content['post_content'] = $newContent;
return $content;
}

Input

This is the <a href="http://cnn.com">test</a>. This is the test.

Output

This is the <a rel="nofollow" href="&quot;http://cnn.com&quot;">test</a>. This is the test.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

douju9847 2011-02-11 17:45

关注

Don't parse HTML with regex. It's not a good idea... Instead, use the Dom functions. Note that you may need to wrap the content in an outer root tag (what I added <root> here for)(.

$externalNoFollow = get_option('my_nofollow_external');
$folderNoFollow = get_option('my_nofollow_folder');
$extRegex = '~'.preg_quote(get_bloginfo('url'), '~') . '~i';
$intRegex = '~'.preg_quote($folder, '~') . '~i';

$dom = new DomDocument();
libxml_use_internal_errors(true);
if (!$dom->loadHtml('<html><body>' . $content['post_content'] . '</body></html>')) {
    /** Error out, since the loading failed. 
        Make sure `$content['post_content']` is valid html
    **/
    die('Invalid HTML detected');
}

$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {
    $href = $link->getAttribute('href');
    if ($href && $externalNoFollow && !preg_match($extRegex, $href)) {
        $link->setAttribute('rel', 'nofollow');
    } elseif ($href && $folderNoFollow && preg_match($intRegex, $href)) {
        $link->setAttribute('rel', 'nofollow');
    }
}
//Since we want to strip the root element, we must do so:
$newContent = '';
$root = $dom->getElementsByTagName('body')->item(0);
foreach ($root->childNodes as $child) {
    $newContent .= $dom->saveXml($child);
}

$content['post_content'] = $newContent;
return $content;

Note, you should add actual error handling incase of invalid HTML...

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

Php preg_match_all仅匹配最后一个元素 php
2019-07-19 08:34

回答 2 已采纳 Here is another variant using \G that is bit faster and avoids empty matches: (?:{{([\w-]+(?:\h+[
preg_match帮助互斥条件匹配 php
2011-02-11 17:30

回答 1 已采纳 Don't parse HTML with regex. It's not a good idea... Instead, use the Dom functions. Note that
preg_match如何返回匹配？ php
2018-12-17 14:58

回答 5 已采纳 Your regex is flawed. Use this: preg_match('/^Tmps(.+)$/', $fieldName, $matches); echo($matches[1
php 单行匹配,使用正则多行模式与单行模式的图文详解
2021-04-21 18:13

Keylove(穆文成)的博客这次给大家带来使用正则多行模式与单行模式的图文详解，使用正则多行...匹配的结果是3eeeee，如上图的Search Results区中所示。为什么这里不能匹配1abcde和2abc？开启多行模式^　可以匹配字符串开头(字符串的开始位...
PHP preg_match（）找不到匹配项 php
2015-10-23 10:47

回答 1 已采纳 You use ^ repexp character. It means: (0[1-9]|[12][0-9]|3[01])[-\/](0[1-9]|1[012]) must to be in t
如果模式不匹配，如何使preg_match_all返回一个空数组值？ php
2017-10-23 16:11

回答 2 已采纳 It looks like each iteration can only return a maximum of one match, so preg_match_all with the in
求php一条preg_match_all正则，取指定div的id开头？ php 正则表达式
2021-08-21 14:27

回答 1 已采纳 $reg = "/<div id=\"num_(.*?)_off\".*?>.*?<\/div>/ism";
正则 php 单行匹配,正则表达式的多行模式与单行模式图文分析
2021-04-12 17:03

weixin_39980234的博客在Expresso中，测试“多行模式”测试...开启多行模式^　可以匹配字符串开头(字符串的开始位置)，也可以匹配行的开头(即换行符\n之后的位置)$　可以匹配字符串结尾(字符串的结束位置), 也可以匹配行的结尾(即换行符\n...
PHP用preg_match_all正则多个关键字怎么写? php
2017-11-30 05:36

回答 8 已采纳 []改为() ``` $pattaern0='/(你好|中国|国家|新年|娱乐|程序|羁绊|www\\.baidu\\.com|google)+/u'; ```
preg_match_all不匹配所有可能性 php
2016-09-02 05:33

回答 2 已采纳 You need to make your regex match non-greedy by adding either a ? : preg_match_all('/{{ type=\"(.
如何在与preg_match匹配后获取变量值 php
2017-01-18 09:30

回答 2 已采纳 Modify your regex slightly so that it can capture the proper substring you're looking for. preg_m
正则环视 php,正则表达式基本知识（php）
2021-04-22 19:32

精准小天使的博客这里的知识点基本上是《正则指引》的读书笔记，只是每个知识点的示例代码用php来实现。1. 字符组字符组(Character Class)...]字符组的基本用法[...]preg_match('/[0123456]/', '5'); // => 1preg_match('/[abc12...
PHP preg_match_all不处理大数据 laravel php
2018-05-16 06:59

回答 1 已采纳 The pattern at play matches balanced curly brackets using regex recursion. The pattern itself look
php正则多行模式,多行模式与单行模式图文分析_正则表达式
2021-04-27 03:29

水千户的博客这篇文章主要介绍了正则表达式的多行模式与单行模式图文分析,需要的朋友可以参考下在Expresso中，测试“多行...开启多行模式^　可以匹配字符串开头(字符串的开始位置)，也可以匹配行的开头(即换行符\n之后的位置)...
php两个手机号正则表达式_php实战正则表达式（一）：验证手机号
2021-02-08 23:08

weixin_39767121的博客本文通过逐步完善一个验证手机号的正则表达式来介绍了正则表达式中的字符组、量词、字符串起始/结束位置、分组、分组中的选择结构、反向引用、命名分组等概念。...如，[0123456789]表示匹配数字...
1+X Web前端等级考证 | PHP 技术与应用(中级重点)
2020-10-08 14:10

Ly_cat的博客动态网站概念误区：不是指网站当中包含动态图片、滚动图等动态效果正解：采用数据库技术开发的网站，网页上的内容都是通过数据库提取出来动 ...后端：php、asp、java 数据库：MySQL、SQLServer、ORA
php 正则表达式单行模式,正则表达式的多行模式与单行模式
2021-04-29 04:26

樱桃阳子的博客开启多行模式^　可以匹配字符串开头(字符串的开始位置)，也可以匹配行的开头(即换行符\n之后的位置) $　可以匹配字符串结尾(字符串的结束位置),也可以匹配行的结尾(即换行符\n之前的位置)关闭多行模...
PHP面试题
2019-10-30 15:43

逆流°只是风景-bjhxcc的博客 php 的垃圾回收机制 PHP 可以自动进行内存管理，清除不需要的对象。 PHP 使用了引用计数 (reference counting) GC 机制。每个对象都内含一个引用计数器 refcount，每个 reference 连接到对象，计数器加 1。当 ...
支付宝手机网站支付，app支付，PC端支付流程以及服务端php支付下单，回调流程详解
2023-03-22 15:31

zhoupenghui168的博客支付宝手机网站支付，app支付，PC端支付流程以及服务端php支付下单，回调流程详解
人生最好的php，mysql，linux，redis，docker等相关技术经典面试题，新手收藏学习，持续更新中。。。
2021-04-25 14:35

黄昏单车的博客 php面试题 1、写出你能想到的所有HTTP返回状态值，并说明用途（比如：返回404表示找不到页面） # 200：服务器请求成功 # 301：永久重定向，旧网页已被新网页永久替代 # 302：表示临时性重定向 # 400：错误请求 # 401...
没有解决我的问题, 去提问

悬赏问题

¥15 孟德尔随机化怎样画共定位分析图
¥18 模拟电路问题解答有偿
¥15 CST仿真别人的模型结果仿真结果S参数完全不对
¥15 误删注册表文件致win10无法开启
¥15 请问在阿里云服务器中怎么利用数据库制作网站
¥60 ESP32怎么烧录自启动程序
¥50 html2canvas超出滚动条不显示
¥15 java业务性能问题求解(sql，业务设计相关)
¥15 52810 尾椎c三个a 写蓝牙地址
¥15 elmos524.33 eeprom的读写问题

码龄粉丝数原力等级 --

preg_match帮助互斥条件匹配

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

preg_match帮助互斥条件匹配

1条回答 默认 最新

悬赏问题

1条回答默认最新