There are many reasons for why the script is so slow, and exactly what you need to do in order to decrease the time it takes depends completely upon what exact parts of the code causes the slow down.
That means that you need to put the code through a profiler, and then tweak the parts of the code that it reports are the cause. Without the profiler, all we can do is guess. Not necessarily correctly.
As noted in the comments to your question, using an already-made search engine would be the far better solution. Especially something which is purpose made for something like this, as it will cut down the time drastically.
Even the built-in grep
command for Linux shells would be an improvement.
That said, I do suspect that the reason your code is so slow is because of the fact that you're reading and searching through the contents of all of the files in PHP. stripos()
is particularly a likely suspect here, as that's a rather slow search.
Another factor might be the read()
calls in the loop, as I believe they do a IO-operation on each call. Also, having a lot of calls to echo
in a script can/will also cause a slow-down, depending upon how many of those you have. Couple of hundred is not really noticeable, but having a few thousand will be.
Taking these last points into consideration, and some other general changes I recommend to make your code easier to maintain, I've made the following changes to your code.
<?php
if (isset ($_POST['text_box'])) {
$path = '/example/files';
$result = search_files ($_POST['text_box'], $path);
}
/**
* Searches through the files in the given path, for the search term.
*
* @param string $term The term to search for, only "word characters" as defined by RegExp allowed.
* @param string $path The path which contains the files to be searched.
*
* @return string Either a list of links to the files, or an error message.
*/
function search_files ($term, $path) {
// Ensuring that we have a closing slash at the end of the path, so that
// we can add a file-descriptor for glob() to use.
if (substr ($path, -1) != '/') {
$path .= '/';
}
// If we don't have a valid/readable path we ened to throw an error now.
// This only happens if the code itself is wrong, as it's not user-supplied,
// thus an exception is thrown.
if (!is_dir ($path) || !is_readable ($path)) {
throw new InvalidArgumentException ("Not a valid search path!");
}
// This should be validated to ensure you get sane input,
// in order to avoid erroneous responses to the user and
// possible attacks.
// Addded a simple test to ensure we only accept "word characters".
if (!preg_match ('/^\w+\\z/', $term)) {
// Invalid input. Show warning to user.
return 'Not a valid search string.';
}
// Using glob so that we retrieve a list of all files in one operation.
$contents = glob ($path.'*');
// Using a holding variable, as this many echo statements take
// noticable longer time than just concatenating strings and
// echoing it out once.
$output = '';
// Using printf() templates to make the code easier to reach.
// Ideally the HTML-code shouldn't be in this string either, but adding
// a templating system is far beyond the reach of this Q&A.
$outTemplate = '<p class="found">Found Match - <a href="http://test.example.com/files/%1$s">%2$s</a></p>';
foreach ($contents as $file) {
// Skip the hardlinks for parent and current folder.
if ($file == '.' || $file == '..') {
continue;
}
// Skip if the path isn't a file.
if (!is_file ($path . '/' . $file)) {
continue;
}
// This one is the big issue. Reading all of the files one by one will take time!
$data = file_get_contents ($path . '/' . $file);
// Same with running a case-insensitive search!
if (stripos ($data, $term) !== false) {
// Added output escaping to prevent issues with possible meta-characters.
// (A problem also known as XSS attacks)
$output .= sprintf ($outTemplate, htmlspecialchars (rawurlencode($file)), htmlspecialchars($file));
}
}
// Lastly, if the output string is empty we haven't found anything.
if (empty($output)) {
return "Term not found";
}
return $output;
}