PHP筛选禁止单词的文本

We have a C2C website and we discourage selling branded products on our website. We have built a database of brand words such as Nike and D&G and made an algorithm that filters product information for these words and disables products if it contains these words.

Our current algorithm removes all white space and special characters from provided text and matches text with word from database. These cases are required to be caught by algorithm and are caught efficiently:

i am nike world
i have n ikee shoes
i have nikeeshoes
i sell i-phone casings
i sell iphone-casings
you can have iphone

Now the problem is that it also catches following:

rapiD Garment factory (for D&G)
rosNIK Electronics (for Nike)

What can be done to prevent such false matches while preserving efficiency with catching true cases?

EDIT

Here's the code for those of you who understand code better:

$orignal_txt = preg_replace('/&.{0,}?;/', '', (strip_tags($orignal_txt)));
$orignal_txt_nospace = preg_replace('/\W/', '', $orignal_txt);
{
    $qry_kws = array("nike", "iphone", "d&g");
    foreach($qry_kws as $rs_kw)
    {       
        $no_space_db_kw = preg_replace('/\W/', '', $rs_kw);
        if(stristr($orignal_txt_nospace, $rs_kw))
        {
            $ipr_banned_keywords[] = strtolower($rs_kw);
        }
        else if(stristr($orignal_txt_nospace, $no_space_db_kw))
        {
                $ipr_banned_keywords[] = strtolower($rs_kw);
        }

    }
}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

dongxie548548 2012-12-21 12:03

关注

Just playing around .... (Not to be used in production)

$data = array(
        "i am nike world",
        "i have n ikee shoes",
        "i have nikeeshoes",
        "i sell i-phone casings",
        "i sell iphone-casings",
        "you can have iphone",
        "rapiD Garment factor",
        "rosNIK Electronics",
        "Buy you self N I K E",
        "B*U*Y I*P*H*O*N*E BABY",
        "My Phone Is not available");


$ban = array("nike","d&g","iphone");

Example 1:

$filter = new BrandFilterIterator($data);
$filter->parseBan($ban);
foreach ( $filter as $word ) {
    echo $word, PHP_EOL;
}

Output 1

rapiD Garment factor
rosNIK Electronics
My Phone Is not available

Example 2

$filter = new BrandFilterIterator($data,true); //reverse filter
$filter->parseBan($ban);
foreach ( $filter as $word ) {
    echo $word, " " , json_encode($word->getBan()) ,  PHP_EOL;
}

Output 2

i am nike world ["nike"]
i have n ikee shoes ["nike"]
i have nikeeshoes ["nike"]
i sell i-phone casings ["iphone"]
i sell iphone-casings ["iphone"]
you can have iphone ["iphone"]
Buy you self N I K E ["nike"]
B*U*Y I*P*H*O*N*E BABY ["iphone"]

Class Used

class BrandFilterIterator extends FilterIterator {
    private $words = array();
    private $reverse = false;

    function __construct(array $words, $reverse = false) {
        $this->reverse = $reverse;
        foreach ( $words as $word ) {
            $this->words[] = new Word($word);
        }
        parent::__construct(new ArrayIterator($this->words));
    }

    function parseBan(array $ban) {
        foreach ( $ban as $item ) {
            foreach ( $this->words as $word ) {
                $word->checkMetrix($item);
            }
        }
    }

    public function accept() {
        if ($this->reverse) {
            return $this->getInnerIterator()->current()->accept() ? false : true;
        }
        return $this->getInnerIterator()->current()->accept();
    }
}


class Word {
    private $ban = array();
    private $word;
    private $parts;
    private $accept = true;

    function __construct($word) {
        $this->word = $word;
        $this->parts = explode(" ", $word);
    }

    function __toString() {
        return $this->word;
    }

    function getTrim() {
        return preg_replace('/\W/', '', $this->word);
    }

    function accept() {
        return $this->accept;
    }

    function getBan() {
        return array_unique($this->ban);
    }

    function reject($ban = null) {
        $ban === null or $this->ban[] = $ban;
        $this->accept = false;
        return $this->accept;
    }

    function checkMetrix($ban) {
        foreach ( $this->parts as $part ) {
            $part = strtolower($part);
            $ban = strtolower($ban);
            $t = ceil(strlen(strtolower($ban)) / strlen($part) * 100);
            $s = similar_text($part, $ban, $p);
            $l = levenshtein($part, $part);
            if (ceil($p) >= $t || ($t == 100 && $p >= 75 && $l == 0)) {
                $this->reject($ban);
            }
        }
        // Detect Bad Use of space
        if (ceil(strlen($this->getTrim()) / strlen($this->word) * 100) < 75) {
            if (stripos($this->getTrim(), $ban) !== false) {
                $this->reject($ban);
            }
        }
        return $this->accept;
    }
}

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(3条)

报告相同问题？

关注问题

php禁止当前页面显示某些字符 php
2018-01-16 09:02

回答 2 已采纳全局过滤 http://blog.41ms.com/post/41.html
PHP 403禁止重定向 php
2017-05-08 12:01

回答 1 已采纳 I've isolated the problem. I was trying to pass a .... within a string that i was saving and passi
如何使用PHP Imagick创建弯曲文本？ php
2015-10-10 09:26

回答 2 已采纳 If you just want it curved, there is a demo here with the code below. $draw = new \ImagickDraw();
PHP输出1-100的质数 php
2021-05-16 11:53

回答 1 已采纳 <?php header("content-type:text/html;charset=utf-8"); function getPrime($num){ $s=""; for (
在PHP中加密文本并在Python中解密 php python
2019-01-14 06:49

回答 1 已采纳 The main issue here is that you're using different key-size. PHP's openssl_encrypt determines the
php 模糊查寻TXT文本并取出一行 php
2018-06-10 03:11

回答 4 已采纳 ``` header("Content-type:text/html;charset=utf-8"); function getStr($v){ $file_path =
如何从php中的文本中提取单词 php
2012-06-06 03:01

回答 2 已采纳 Take a look at the PHP manual explode http://php.net/manual/en/function.explode.php explode by s
php取对应的地区名txt文本内容 php
2018-05-21 18:55

回答 4 已采纳 ``` ```
php 并发统计会员调用api次数 php
2020-04-21 12:14

回答 1 已采纳 https://www.cnblogs.com/spectrelb/p/7233688.html
无法找到包php7.3-gd php
2019-08-07 07:46

回答 1 已采纳 It looks like you do not have the appropriate repo added. Try : sudo add-apt-repository ppa:ondre
没有解决我的问题, 去提问

悬赏问题

¥15 请问读取环境变量文件失败是什么原因？
¥15 在若依框架下实现人脸识别
¥15 网络科学导论，网络控制
¥100 安卓tv程序连接SQLSERVER2008问题
¥15 利用Sentinel-2和Landsat8做一个水库的长时序NDVI的对比，为什么Snetinel-2计算的结果最小值特别小，而Lansat8就很平均
¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载

码龄粉丝数原力等级 --

PHP筛选禁止单词的文本

4条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

PHP筛选禁止单词的文本

4条回答 默认 最新

悬赏问题

4条回答默认最新