I have 20MB flat file database with about 500k lines, only [a-z0-9-]
characters are allowed, average 7 words in line, no empty or duplicate lines:
Flat file database:
put-returns-between-paragraphs
for-linebreak-add-2-spaces-at-end
indent-code-by-4-spaces-indent-code-by-4-spaces
I'm searhcing for whole words only
and extracting first 10k results
from this db.
So far this code work ok if the 10k matches are found in let's say first 20k lines of the db, but if the word is rare, the script must search all 500k lines and this is 10 times slower.
Settings:
$cats = file("cats.txt", FILE_IGNORE_NEW_LINES);
$search = "end";
$limit = 10000;
Search:
foreach($cats as $cat) {
if(preg_match("/\b$search\b/", $cat)) {
$cats_found[] = $cat;
if(isset($cats_found[$limit])) break;
}
}
My php skills and knowledge are limited, I cannot and don't know how to use sql, so this is the best I can do it, but I need some advices:
- Is this the right code to do it, foreach and preg_match are problem?
- Should I split large file into smaller files, if yes what sizes?
- And in the end, will sql be faster and how much? (Option for the future)
Thanks for reading this and sorry for bad English, this is my 3rd language.