dongliufa6380 2014-08-19 19:32
浏览 35

在php中搜索大量文本

I have a project, it's based on Symfony2, that allows a user to create a number of rules for assigning items to different categories. It is a backend project, so there will be only a few users accessing it.

Rules are based on phrases, items are text objects. I am trying to figure out a best way to search and apply rules to those text objects as quickly and smoothly as possible.

For example, if a user creates 5 rules (phrases: basketball, football, baseball, swimming, running), and all text objects matching any of these phrases should be assigned to SPORTS category, I figured I could use ElasticSearch to quickly return IDs of those objects and then using a simple INSERT or UPDATE mysql query save the assignment.

I am worried about the performance, if there would be, for example, 1 milion text objects in index and say, there are 50k objects matching those rules, running search queries in parts, for example limiting the scope to 50k for each ElasticSearch query (to iterate over the whole index), then updating/inserting data to MySQL would be an acceptable approach?

So, running query (pseudo):

$ids = elasticSearch->setPhrases('basketball OR baseball OR football')->find()->limit(1, 50000);
$ids = elasticSearch->setPhrases('basketball OR baseball OR football')->find()->limit(50000, 100000);

etc.

Is ElasticSearch a good choice for that kind of processing? Or should I stick to MySQL and run queries using regexp for example (in chunks of course)?

Maybe there are existing solutions I could check? Unfortunately, I am limited to PHP and Symfony2, but if there are any better solutions worth checking, I might be able to suggest it to the client.

Hope someone can help me, any help is more then welcome.

  • 写回答

2条回答 默认 最新

  • douzhi4991 2014-08-19 20:41
    关注

    Supposing you are using text fields, you can create a FULLTEXT index on your table:

    CREATE TABLE texts(
        id int not null auto_increment primary key,
        text_field1 text,
        text_field2 text,
        text_field3 text
    )Engine = MyISAM;  -- InnoDB supports fulltext indexes since v5.6
    
    CREATE FULLTEXT INDEX itexts on texts(text_field1,text_field2,text_field3);
    

    Then you can search using fulltext expressions (finding at least one term):

    SELECT * FROM texts 
    WHERE MATCH (text_field1,text_field2,text_field3) 
    AGAINST ('basketball baseball football');
    

    Or finding all terms

    SELECT * FROM texts 
    WHERE MATCH (text_field1,text_field2,text_field3) 
    AGAINST ('+basketball +baseball +football');
    

    Now in your project you can translate your rules to fulltext search expression and execute it using a regular symfony query.

    More info about FULLTEXT search: http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html

    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题