dongyin2390 2017-07-02 16:40
浏览 60
已采纳

PHP正则表达式替换文本中的关键字,而不是锚标记内的关键字

I am trying to implement automatic hyperlink function for keywords.

The issue I am having that a keyword can be a part of other keyword. For example: potato, sweet potato. The function has to know not to hyperlink potato in sweet potato..

I am using regex and it actually works on different environments but not in my localhost and not in live version..

Working example:

$keywords_external = array(
  "Sweet potato" => "food/sweet-potato", 
  "Potato salads" => "food/potato-salads",
  "Potato" => "food/potato", 
);

$data = array(
  'post_content' => 'Sweet potato some text then potato then more text and then potato salads'
);

foreach($keywords_external as $key => $href) 
{
  $regex = '/<a\b(?=\s)(?:[^>=]|=\'[^\']*\'|="[^"]*"|=[^\'"\s]*)*"\s?>.*?<\/a>|('.$key.')/ims';

  $data['post_content'] = preg_replace_callback(
    $regex,
    function ($matches) {
        if (array_key_exists (1, $matches)) {
           return '<a href="https://example.com/">'. $matches[1] .'</a>';
        }
        return $matches[0];
    },
    $data['post_content']
  );
}

echo $data['post_content'];

Works on http://phpfiddle.org

Same example does not work in live version..

Any ideas to achieve same thing differently or why it does not might work in live version?

Thanks.

PHP VERSIONS
Localhost: PHP Version 7.0.12,
Live: PHP Version 7.0.15-1+deb.sury.org~xenial+1,
Live which works: PHP Version 5.4.45

  • 写回答

1条回答 默认 最新

  • doudou130216 2017-07-02 17:48
    关注

    [edit]: since you are dealing with multibyte characters, the code needs to be edited a little:

    I don't know what is your exact problem but, this is the way I will do it (in a single pass):

    // all keys must be lowercase
    $keywords_external_path = array(
        "business analyst là gì" => "business-analyst/", 
        "tài liệu cho business analyst" => "tai-lieu-cho-business-analyst/", 
        "product manager là gì" => "product-manager-la-gi/", 
        "công việc của product manager" => "product-manager-phai-biet-dat-cau-hoi/", 
        "QA là gì" => "qc-la-gi-qa-la-gi/", 
        "QC là gì" => "qc-la-gi-qa-la-gi/", 
        "Kiểm thử phần mềm là gì" => "kiem-thu-phan-mem-ai-lam-chang-duoc/", 
        "Automation QA là gì" => "3-loi-khuyen-giup-ban-nang-cap-su-nghiep-qa/", 
        "Tester là gì" => "tester-thanh-cong/", 
        "kỹ năng của Tester giỏi" => "tester-thanh-cong/", 
        "công việc QA" => "qa-gioi/", 
        "Technical Architect là gì" => "how-to-become-ta/", 
    );
    
    //change the keys to lowercase (support multibyte characters)
    mb_internal_encoding("UTF-8");
    $keywords_external_path = array_combine(
        array_map('mb_strtolower', array_keys($keywords_external_path)), 
        $keywords_external_path
    );
    
    $data = array(
      'post_content' => '"business analyst là gì" => "business-analyst/", 
                "tài liệu cho business analyst" => "tai-lieu-cho-business-analyst/", 
                "product manager là gì" => "product-manager-la-gi/", 
                "công việc của product manager" => "product-manager-phai-biet-dat-cau-hoi/", 
                "QA là gì" => "qc-la-gi-qa-la-gi/", 
                "QC là gì" => "qc-la-gi-qa-la-gi/", 
                "Kiểm thử phần mềm là gì" => "kiem-thu-phan-mem-ai-lam-chang-duoc/", 
                "Automation QA là gì" => "3-loi-khuyen-giup-ban-nang-cap-su-nghiep-qa/", 
                "Tester là gì" => "tester-thanh-cong/", 
                "kỹ năng của Tester giỏi" => "tester-thanh-cong/", 
                "công việc QA" => "qa-gioi/", 
                "Technical Architect là gì" => "how-to-become-ta/"'
    );
    
    $base = 'http://yourdomain.com/'; // only if this is useful
    
    $keywords_external = array_map(function ($i) {
        return preg_quote($i, '~');
    }, array_keys($keywords_external_path));
    rsort($keywords_external);
    
    // to quickly discard useless positions (if you have many keywords):
    // you can also do the same with the second letter
    $keywords_first_letter = implode('',
        array_unique(
            array_reduce($keywords_external, function ($c, $i) {
                $c[] = mb_substr($i, 0, 1); return $c;
            }, [])
        )
    );
    
    $pattern = '~'
             . '(?=['. preg_quote($keywords_first_letter, '~') . '])'
             . '(?=\b\w|(?<!\S)\W)'
             . '(?:' . implode('|', $keywords_external) . ')'
             . '(?<=\w\b|\W(?!\S))~iu';
    
    $result = preg_replace_callback($pattern, function ($m) use ($keywords_external_path, $base) {
        return '<a href="' . $base . $keywords_external_path[mb_strtolower($m[0])] . '">'
             . $m[0] . '</a>'; 
    }, $data['post_content']);
    
    echo $result;
    

    demo with several php versions

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 应用商店如何检测在架应用内容是否违规?
  • ¥15 Ubuntu系统配置PX4
  • ¥20 MATLAB间接平差计算
  • ¥50 nw.js调用activex
  • ¥15 数据库获取信息反馈出错,直接查询了ref字段并且还使用了User文档的_id而不是自己的
  • ¥15 将安全信息用到以下对象时发生以下错误:c:dumpstack.log.tmp 另一个程序正在使用此文件,因此无法访问
  • ¥15 速度位置规划实现精确定位的问题
  • ¥15 MAC虚拟机(win11)USB插上后无串口com,无法烧录
  • ¥15 代码问题:df = pd.read_excel('c:\User\18343\Desktop\wpsdata.xlxs')路径读不到
  • ¥50 基于ubuntu的Mamba配置环境失败问题