I have a project, it's based on Symfony2, that allows a user to create a number of rules for assigning items to different categories. It is a backend project, so there will be only a few users accessing it.
Rules are based on phrases, items are text objects. I am trying to figure out a best way to search and apply rules to those text objects as quickly and smoothly as possible.
For example, if a user creates 5 rules (phrases: basketball
, football
, baseball
, swimming
, running
), and all text objects matching any of these phrases should be assigned to SPORTS category, I figured I could use ElasticSearch to quickly return IDs of those objects and then using a simple INSERT or UPDATE mysql query save the assignment.
I am worried about the performance, if there would be, for example, 1 milion text objects in index and say, there are 50k objects matching those rules, running search queries in parts, for example limiting the scope to 50k for each ElasticSearch query (to iterate over the whole index), then updating/inserting data to MySQL would be an acceptable approach?
So, running query (pseudo):
$ids = elasticSearch->setPhrases('basketball OR baseball OR football')->find()->limit(1, 50000);
$ids = elasticSearch->setPhrases('basketball OR baseball OR football')->find()->limit(50000, 100000);
etc.
Is ElasticSearch a good choice for that kind of processing? Or should I stick to MySQL and run queries using regexp for example (in chunks of course)?
Maybe there are existing solutions I could check? Unfortunately, I am limited to PHP and Symfony2, but if there are any better solutions worth checking, I might be able to suggest it to the client.
Hope someone can help me, any help is more then welcome.