I'm trying to build the following faceted search solution:
- Read 4000 product prices from different files. Each price has 30 properties
- Get faceted search counts from those 4000 rows
- Read many different selections (by filtering) from those 4000 rows
On a busy moment, our application must be able to process 2000 of these actions in one hour. The challenge is to be able to read rows and counts as quick as possible. Also, writing and reading must be done from multiple PHP processes.
As a solution, so far I've chosen to insert those 4000 product prices into a database, and use queries to get counts / selections of rows:
- MySQL Memory: inserting rows takes about 250 ms, reading and counting is very quick (about 2 ms for every SELECT query)
- MySQL InnoDB: inserting rows takes about 600 ms, reading and counting is a bit slower (about 5 ms for every SELECT query)
- Elasticsearch: inserting rows takes about 250 ms, but must be done with a bulk insert (HTTP request). Waiting for index ready takes about 250 ms more. (500 total). Reading and counting works differently (with aggregation), but can be very quick.
This comparison has not been completely fair, because for MySQL i've been using prepared statements instead of a bulk insert. (also, because the bulk insert is a disadvantage and only used with elastic to prevent 4000 slow HTTP requests).
The performance from MySQL Memory tables seems to be great and acceptable, even with 4 million records in the storage. But I'm a bit worried about the scalability and stability of this solution.
Other explored options are:
- Apache ignite (but was not able to get it to work with PHP 7)
- MemSQL (too expensive)
- VoltDB (only PHP 5.3, not PHP 7)
- MongoDB (performance like elasticsearch)
I've also thought about solutions without using a database (with a lot of calculations / loops in PHP), but this is far from ideal because the process of reading and inserting the 4000 product prices must be done in multiple PHP processes (for speed).
What other (database) solutions could be a better fit here? It's about the mix of writing and reading data as quick as possible.