I've a project to store and process expenses of a user. There will be only two operations on the database: INSERT and SELECT. The database may contain millions of entries per day (depending on no. of users or if the user is a company etc.).
common queries:
- Show the expenses from
date x
todate y
. (Mostly) - Applying filters on query (1).
- Show expenses on a particular item till date. (Queries throughout the table)
- Show all expenses till date. (rarely)
1: I'm confused in which database to use for this: SQL or NoSQL or SQL and NoSQL combinely (How in my case?). I need a comparison based on speed while querying a large amount of data.
2: Since per day it may contain millions of records, Millions of rows resulted by query need to be transferred from the server to client. (In my case there is one more overhead. As the database server is remote to Web server. So the result need to be transferred from database server to web server and then to client.) How to make it faster?
4: If i choose MySQL, which will be better: a: Dumping the whole data in one big SQL table. b: Creating a table for each day (with date as the table name) which will contain smaller amounts of data. (I thought (b) will be faster while giving a date range since I know which table to select instead of looking in a big table and querying for the specific date.)
3: For now I'm trying with MySQL. (The test data is already there. I am using a python script to parse that data and dump it in MySQL. I can edit the script and make it work for any type of database.) I tried the query (4) mentioned above. With the result from the database server, my web-server/client (since I'm testing, my web server is the client now.) kind of hanged while around 13 million of rows as the result of the query are being transferred from the database server. So I used a loop in my PHP code to limit the query 1000 rows at a time like the sample below:
(Loop until getting data from database){
i=0;
SELECT * FROM <Table> LIMIT i, 1000;
i+=1000;
}
It's still slow, but now the system is not hanging during the transfer. But does LIMIT
here will work like, get 1st 1000 records (while i=0
), Then the 2nd 1000 records (while i=1000
) and so on. Or It'll get 1st 1000 records (while i=0
), Then start from 0 again but skip the 1st 1000 records and get the 2nd 1000 records (while i=1000
)and so on which will be much slower obviously. (I searched online to know the mechanism of LIMIT but everywhere they are saying about LIMIT with ORDER BY not how to get page wise data with LIMIT and what is the impact on performance with it.)
P.S. I'm not a database pro. Just a beginner. So asking for Experts' suggestions before starting the project.