Hello! and please don't be scared by my wall of text...
My engineering team and I are attempting to relate HTML/PHP to a MySQL database while using R programming to provide quick, graphical analysis to the website user.
Synopsis of our idea:
- Prompt user upload a .txt file of multiple receipts on website
- Upload said file to a pre-existing MySQL database
- Query & calculate useful statistics using R
- Display graphs and statistics of interest to the user
We've made a lot of progress on the R code and what we plan on calculating, however our main issue lies in the uploading of the data.
*The receipt files are all similar to this:
There are multiple files, each of which is around 20 MB in size. The dashes signify the start/end of a receipt with respect to the date.
Our problem:
With our current knowledge, the only way to account for the "---" delimiters is to use an if statement. Accounting for the varying length of the files, we're running a while loop until the end. These are both done in PHP. This allows us to hold the 'Retailer ID' and the 'Date' as variables for each line that 'Receipt ID' and 'Price' is found. Thereafter, we upload those 4 variables, line-by-line, into our MySQL database. We've decided to disregard the 'Tax', 'Total', and 'Payment Type'.
For example, we want the first couple entries in our database to appear like this in order to query the data:
RetailerID | Date | ProdID | Price
1 | 01/01/2014 | 79 | $7.93
1 | 01/01/2014 | 1 | $24.62
Clearly, this takes forever. It causes our website to have a fatal timeout error, one that we have no ability to change, so we've had to put a limit (cnt < 20000) in our loop just to get something into our database. It works for what we need, but not what we want given the receipt files are over 1,000,000 lines!
We've been exploring the usage of this SQL code:
"LOAD DATA LOCAL INFILE '$file'
INTO TABLE Receipts
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '
';";
This successfully uploads the entire receipt, regardless of size, almost instantly. Still, we have no idea how to remove the dashes or iterate the 'Date'/'Retailer ID' as variables unless we use a while loop. We want to cut that out completely as it slows our website down massively.
I think this would be an easier method, but all the technical resources online tend to be a bit over our heads. We've been exploring the usage of Java after reading up on a few of these, but again -- whooooooosh.
This is the first time we've worked with PHP and MySQL, so any nudge in the right direction would be great. There has to be a way to get rid of our deadly while loop!