dongping6974 2011-02-11 17:30
浏览 41
已采纳

preg_match帮助互斥条件匹配

I've got two options for this plugin.

(1) nofollow all external links in content

and/or

(2) no follow links to this target folder (enter the absolute url to the target folder)

In option 2, the links could be internal OR external.

Both options can be set, neither option may be set, or a single option may be set.

if(get_option('my_nofollow') || get_option('my_nofollow_folder')){add_filter('wp_insert_post_data', 'save_my_nofollow' );}

So I'm setting a filter when either of these options are set, to the function below. My question is, how do I alter the function so that if (2) is set but not (1) I only add nofollow to links matching the target folder URL?

function save_my_nofollow($content) {
$my_folder =  get_option('my_nofollow_folder');
$matches = array();
    preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
    for ( $i = 0; $i <= sizeof($matches[0]); $i++){
        if ( isset($matches[0][$i]) && (preg_match('~' . $my_folder . '~', $matches[0][$i]) 
               || !preg_match( '~'.get_bloginfo('url').'~',$matches[0][$i]))){
        $result = trim($matches[0][$i],">");
        $result .= ' rel="nofollow">';
        $content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
        }
    }
    return $content;
}

UPDATED code with best answer:

if(get_option('rseo_nofollow') 
    || get_option('rseo_nofollow_folder')){
    add_filter('wp_insert_post_data', 'save_rseo_nofollow' );
    }

function save_rseo_nofollow($content) {
    $folder =  get_option('rseo_nofollow_folder');
    $externalNoFollow = get_option('rseo_nofollow_external');
    $folderNoFollow = get_option('rseo_nofollow_folder');
    $extRegex = '~'.preg_quote(get_bloginfo('url'), '~') . '~i';
    $intRegex = '~'.preg_quote($folder, '~') . '~i';

    $dom = new DomDocument();
    libxml_use_internal_errors(true);
    $dom->loadXml('<root>' . $content['post_content'] . '</root>');

    $links = $dom->getElementsByTagName('a');
    foreach ($links as $link) {
        $href = $link->getAttribute('href');
        if ($href && $externalNoFollow && !preg_match($extRegex, $href)) {
            $link->setAttribute('rel', 'nofollow');
        } elseif ($href && $folderNoFollow && preg_match($intRegex, $href)) {
            $link->setAttribute('rel', 'nofollow');
        }
    }
//  print $dom->saveXml();die;
    //Since we want to strip the root element, we must do so:
    $newContent = '';
    $root = $dom->getElementsByTagName('root')->item(0);
    foreach ($root->childNodes as $child) {
        $newContent .= $dom->saveXml($child);
    }
    $content['post_content'] = $newContent;
return $content;
}

Input

This is the <a href="http://cnn.com">test</a>. This is the test.

Output

This is the <a rel="nofollow" href="&quot;http://cnn.com&quot;">test</a>. This is the test.
  • 写回答

1条回答 默认 最新

  • douju9847 2011-02-11 17:45
    关注

    Don't parse HTML with regex. It's not a good idea... Instead, use the Dom functions. Note that you may need to wrap the content in an outer root tag (what I added <root> here for)(.

    $externalNoFollow = get_option('my_nofollow_external');
    $folderNoFollow = get_option('my_nofollow_folder');
    $extRegex = '~'.preg_quote(get_bloginfo('url'), '~') . '~i';
    $intRegex = '~'.preg_quote($folder, '~') . '~i';
    
    $dom = new DomDocument();
    libxml_use_internal_errors(true);
    if (!$dom->loadHtml('<html><body>' . $content['post_content'] . '</body></html>')) {
        /** Error out, since the loading failed. 
            Make sure `$content['post_content']` is valid html
        **/
        die('Invalid HTML detected');
    }
    
    $links = $dom->getElementsByTagName('a');
    foreach ($links as $link) {
        $href = $link->getAttribute('href');
        if ($href && $externalNoFollow && !preg_match($extRegex, $href)) {
            $link->setAttribute('rel', 'nofollow');
        } elseif ($href && $folderNoFollow && preg_match($intRegex, $href)) {
            $link->setAttribute('rel', 'nofollow');
        }
    }
    //Since we want to strip the root element, we must do so:
    $newContent = '';
    $root = $dom->getElementsByTagName('body')->item(0);
    foreach ($root->childNodes as $child) {
        $newContent .= $dom->saveXml($child);
    }
    
    $content['post_content'] = $newContent;
    return $content;
    

    Note, you should add actual error handling incase of invalid HTML...

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 孟德尔随机化怎样画共定位分析图
  • ¥18 模拟电路问题解答有偿
  • ¥15 CST仿真别人的模型结果仿真结果S参数完全不对
  • ¥15 误删注册表文件致win10无法开启
  • ¥15 请问在阿里云服务器中怎么利用数据库制作网站
  • ¥60 ESP32怎么烧录自启动程序
  • ¥50 html2canvas超出滚动条不显示
  • ¥15 java业务性能问题求解(sql,业务设计相关)
  • ¥15 52810 尾椎c三个a 写蓝牙地址
  • ¥15 elmos524.33 eeprom的读写问题