douchao1879 2015-03-23 01:56
浏览 18
已采纳

在20mb平面文件数据库(PHP)中搜索整个单词的最快方法

I have 20MB flat file database with about 500k lines, only [a-z0-9-] characters are allowed, average 7 words in line, no empty or duplicate lines:

Flat file database:

put-returns-between-paragraphs
for-linebreak-add-2-spaces-at-end
indent-code-by-4-spaces-indent-code-by-4-spaces

I'm searhcing for whole words only and extracting first 10k results from this db.

So far this code work ok if the 10k matches are found in let's say first 20k lines of the db, but if the word is rare, the script must search all 500k lines and this is 10 times slower.

Settings:

$cats = file("cats.txt", FILE_IGNORE_NEW_LINES);
$search = "end";
$limit = 10000;

Search:

foreach($cats as $cat) {
    if(preg_match("/\b$search\b/", $cat)) {
        $cats_found[] = $cat;
        if(isset($cats_found[$limit])) break;
    }
}

My php skills and knowledge are limited, I cannot and don't know how to use sql, so this is the best I can do it, but I need some advices:

  • Is this the right code to do it, foreach and preg_match are problem?
  • Should I split large file into smaller files, if yes what sizes?
  • And in the end, will sql be faster and how much? (Option for the future)

Thanks for reading this and sorry for bad English, this is my 3rd language.

  • 写回答

2条回答 默认 最新

  • duan0417 2015-03-23 03:19
    关注

    If most of the lines don't contain the searched word, you could execute preg_match() less often, like so:

    foreach ($lines as $line) {
        // fast prefilter...
        if (strpos($line, $word) === false) {
            continue;
        }
        // ... then proper search if the line passed the prefilter
        if (preg_match("/\b{$word}\b/", $line)) {
            // found
        }
    }
    

    Though, it requires benchmarking in practical situation.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮