PHP performant搜索给定用户名的文本

I am currently dealing with a performance issue where I cannot find a way to fix it. I want to search a text for usernames mentioned with the @ sign in front. The list of usernames is available as PHP array.

The problem is usernames may contain spaces or other special characters. There is no limitation for it. So I can't find a regex dealing with that. Currently I am using a function which gets the whole line after the @ and checks char by char which usernames could match for this mention, until there is just one username left which totally matches the mention. But for a long text with 5 mentions it takes several seconds (!!!) to finish. for more than 20 mentions the script runs endlessly.

I have some ideas, but I don't know if they may work.

Going through username list (could be >1.000 names or more) and search for all @Username without regex, just string search. I would say this would be far more inefficient.
Checking on writing the usernames with JavaScript if space or punctual sign is inside the username and then surround it with quotation marks. Like @"User Name". Don't like that idea, that looks dirty for the user.
Don't start with one character, but maybe 4. and if no match, go back. So same principle like on sorting algorithms. Divide and Conquer. Could be difficult to implement and will maybe lead to nothing.

How does Facebook or twitter and any other site do this? Are they parsing the text directly while typing and saving the mentioned usernames directly in the stored text of the message?

This is my current function:

$regular_expression_match = '#(?:^|\\s)@(.+?)(?:
|$)#';
$matches = false;
$offset = 0;

while (preg_match($regular_expression_match, $post_text, $matches, PREG_OFFSET_CAPTURE, $offset))
{
    $line = $matches[1][0];
    $search_string = substr($line, 0, 1);
    $filtered_usernames = array_keys($user_list);
    $matched_username = false;

    // Loop, make the search string one by one char longer and see if we have still usernames matching
    while (count($filtered_usernames) > 1)
    {
        $filtered_usernames = array_filter($filtered_usernames, function ($username_clean) use ($search_string, &$matched_username) {
            $search_string = utf8_clean_string($search_string);

            if (strlen($username_clean) == strlen($search_string))
            {
                if ($username_clean == $search_string)
                {
                    $matched_username = $username_clean;
                }
                return false;
            }

            return (substr($username_clean, 0, strlen($search_string)) == $search_string);
        });

        if ($search_string == $line)
        {
            // We have reached the end of the line, so stop
            break;
        }
        $search_string = substr($line, 0, strlen($search_string) + 1);
    }

    //  If there is still one in filter, we check if it is matching
    $first_username = reset($filtered_usernames);
    if (count($filtered_usernames) == 1 && utf8_clean_string(substr($line, 0, strlen($first_username))) == $first_username)
    {
        $matched_username = $first_username;
    }

    // We can assume that $matched_username is the longest matching username we have found due to iteration with growing search_string
    // So we use it now as the only match (Even if there are maybe shorter usernames matching too. But this is nothing we can solve here,
    // This needs to be handled by the user, honestly. There is a autocomplete popup which tells the other, longer fitting name if the user is still typing,
    // and if he continues to enter the full name, I think it is okay to choose the longer name as the chosen one.)
    if ($matched_username)
    {
        $startpos = $matches[1][1];

        // We need to get the endpos, cause the username is cleaned and the real string might be longer
        $full_username = substr($post_text, $startpos, strlen($matched_username));
        while (utf8_clean_string($full_username) != $matched_username)
        {
            $full_username = substr($post_text, $startpos, strlen($full_username) + 1);
        }

        $length = strlen($full_username);
        $user_data = $user_list[$matched_username];

        $mentioned[] = array_merge($user_data, array(
            'type'          => self::MENTION_AT,
            'start'         => $startpos,
            'length'        => $length,
        ));
    }

    $offset = $matches[0][1] + strlen($search_string);
}

Which way would you go? The problem is the text will be displayed often and parsing it every time will consume a lot of time, but I don't want to heavily modify what the user had entered as text.

I can't find out what's the best way, and even why my function is so time consuming.

A sample text would be:

Okay, @Firstname Lastname, I mention you! Listen @[TEAM] John, you are a team member. @Test is a normal name, but @Thât♥ should be tracked too. And see @Wolfs garden! I just mean the Wolf.

Usernames in that text would be

Firstname Lastname
[TEAM] John
Test
Thât♥
Wolf

So yes, there is clearly nothing I know where a name may end. Only thing is the newline.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongshengheng1013 2015-01-30 13:42
关注
I think the main problem is, that you can't distinguish usernames from text and it's a bad idea, to lookup maybe thousands of usernames in a text, also this can lead to further problems, that John is part of [TEAM] John‌ or JohnFoo...

It's needed to separate the usernames from other text. Assuming that you're using UTF-8, could put the usernames inside invisible zero-w space \xE2\x80\x8B and non-joiner \xE2\x80\x8C.

The usernames can now be extracted fast and with little effort and if needed still verified in db.

$txt = " Okay, @\xE2\x80\x8BFirstname Lastname\xE2\x80\x8C, I mention you! Listen @\xE2\x80\x8B[TEAM] John\xE2\x80\x8C, you are a team member. @\xE2\x80\x8BTest\xE2\x80\x8C is a normal name, but @\xE2\x80\x8BThât?\xE2\x80\x8C should be tracked too. And see @\xE2\x80\x8BWolfs\xE2\x80\x8C garden! I just mean the Wolf."; // extract usernames if(preg_match_all('~@\xE2\x80\x8B\K.*?(?=\xE2\x80\x8C)~s', $txt, $out)){ print_r($out[0]); }

Array ( [0] => Firstname Lastname 1 => [TEAM] John 2 => Test 3 => Thât♥ 4 => Wolfs )

echo $txt;

Okay, @Firstname Lastname, I mention you! Listen @[TEAM] John‌, you are a team member. @Test‌ is a normal name, but @Thât♥‌ should be tracked too. And see @Wolfs‌ garden! I just mean the Wolf.

Could use any characters you like and that possibly don't occur elsewhere for separation.

Regex FAQ, Test at eval.in (link will expire soon)
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

PHP使用模式读取txt文件并保留信息 mysql php
2016-02-23 15:31

回答 2 已采纳 No need to build a tree structure when you will need to flatten it again to insert into the databa
通过shadowbox加载php页面的问题 jquery php
2010-06-28 16:02

回答 2 已采纳 For starters it might help if the $(".example7").colorbox() matched the class of the contact link.
获取具有修改日期和大小的大型文件列表的高效方法 php
2017-08-14 17:32

回答 1 已采纳 Not likely. In accessing the filesystem, the metadata for each file is grabbed per file, not in b
laravel-performant：Laravel表演者
2021-02-23 00:19

Laracasts包含1500多个视频教程，涉及各种主题，包括Laravel，现代PHP，单元测试和JavaScript。深入我们全面的视频库，提高您的技能。 Laravel赞助商我们要感谢以下赞助Laravel开发的赞助商。如果您有兴趣成为...
如何获得原则的sql php
2017-07-28 09:17

回答 1 已采纳 According to Logging Doctrine SQL queries in Symfony2: You can use a Doctrine logger such as Debu
查找深层嵌套数组中特定键的最低值 php
2017-01-05 19:07

回答 2 已采纳 Extract an array of lowPrice and find the min(): echo min(array_column($orders, 'lowPrice')); S
Laravel orWhere（）/ MySQL或查询需要很长时间 laravel mysql php
2016-12-30 13:09

回答 1 已采纳 See the resulting query and its WHERE conditions. You definitely miss some brackets there, as I gu
wp-performant-media：WordPress的简单延迟加载插件
2021-02-05 20:03

具有微妙CSS过渡和polyfill支持的简单延迟加载图像。实验室提供的免费和开源软件。 :rocket: 使用composer安装或添加到plugins目录。启用。利润！ :hammer_and_wrench: :package: yarn yarn build
在smarty中创建URL编码的字符串 php
2016-06-29 09:37

回答 1 已采纳 You can combine json_encode (or serialize if you only need to use it in php) and escape: {$arr|js
返回某种类型的有限数量的记录，但其他记录的数量不限？ javascript mapreduce mongodb php
2014-07-30 15:26

回答 4 已采纳 Problem The results here are not impossible but are also possibly impractical. The general no
使用Json格式减少MySQL中的JOIN json mysql php sql
2014-01-02 17:58

回答 2 已采纳 This is a good question. What you propose is, with respect, a very bad idea indeed if you're using
phpredis手册
2021-12-22 18:52

zhaoqhu的博客 phpredis是可以通过pecl以扩展的方式redis.so或者redis.dll安装到php的扩展中，通过配置php.ini来启用。 phpredis在pecl中的地址是http://pecl.php.net/package/redis 下面的phpredis文档说明是从Github上面复制过来...
在MySQL中选择多行的前十个条目 javascript mysql php sql
2015-09-05 11:47

回答 3 已采纳 Go with rows, not columns, for storing scores. Have composite index on userid,score. A datetime co
php brpc,githubprojects
2021-04-16 02:01

霍冉的博客 Better latency and throughput Although almost all RPC implementations claim that they're "high-performant", the numbers are probably just numbers. Being really high-performant in different scenarios ...
stanangeloff php.vim,bleachangel
2021-04-22 14:46

Purkialo的博客 quickfix lists Browse and open sessions Filter and browse Z (jump around) data file Modern performant generic finder and dispatcher Operators & Text Objects Name Description Define your own custom ...
php flarum,甘立平
2021-04-25 11:00

weixin_39600616的博客 The interface is powered by Mithril, a performant JavaScript framework with a tiny footprint. Beautiful and responsive. This is forum software for humans. Flarum is carefully designed to be ...
orange pi php,Orange Pi 安装 Docker
2021-03-18 07:09

刘云宾的博客 That said, the Orange Pi range is cheaper and in some cases just as performant. You can build a cluster and run Docker Swarm at a fraction of the price of Pi Model 2s. YMMV Quick start This is a ...
Easy Performant Outline 2D 3D URP HDRP 3.4.2
2023-02-10 17:28

Easy Performant Outline Plugin
php libev pthreads,基准测试:LIBEVENT vs LIBEV
2021-03-28 08:12

長安的博客 Both libraries were configured to use the epoll interface (the most performant interface available in either library on the test machine). The same benchmark program was used to both run the libevent...
PHP 7.1的新增功能和令人兴奋的功能？
2020-08-31 16:19

culh2177的博客 array(3) { [0]=> int(2) [1]=> int(3) [2]=> int(4) } object(Collection)#2 (0) { } 从可调用函数中关闭 (Closure From Callable Function) The new fromCallable method provides a performant and compact way ...
没有解决我的问题, 去提问

悬赏问题

¥35 引用csv数据文件（4列1800行），通过高斯-赛德尔法拟合曲线，在选取（每五十点取1点）数据，求该数据点的曲率中心。
¥20 程序只发送0X01,串口助手显示不正确,配置看了没有问题115200-8-1-no，如何解决？
¥15 Google speech command 数据集获取
¥15 vue3+element-plus页面崩溃
¥15 像这种代码要怎么跑起来？
¥15 安卓C读取/dev/fastpipe屏幕像素数据
¥15 pyqt5tools安装失败
¥15 mmdetection
¥15 nginx代理报502的错误
¥100 当AWR1843发送完设置的固定帧后，如何使其再发送第一次的帧

PHP performant搜索给定用户名的文本

1条回答 默认 最新

悬赏问题

1条回答默认最新