I'm trying to search through PDF files for multiple key words. I've got ~60 PDFs and ~8 key words and don't fancy ~480 manual searches.
I'm open for other suggestions (see below), but at present my approach is to use mdfind
on OS X, like this:
$finds = array();
foreach ($search as $term) {
$result = "";
$cleanResult = array();
$shellQuery = "mdfind -onlyin \"$wd\" \"kind:pdf $term\"";
echo "
$shellQuery
";
$result = shell_exec($shellQuery);
echo $result;
$cleanResult = split("
", $result);
array_pop($cleanResult);
$finds[$term] = $cleanResult;
unset($result);
unset($cleanResult);
}
print_r($finds);
However, although this builds $shellQuery
just fine, for some reason $result
doesn't always get populated even when the command works (i.e. if I copy and paste the value of $shellQuery
into a terminal window, it works as expected).
Let's say $search
contains 'foo', 'bar' and 'joe', it might find 'foo' and 'joe' fine, but return nothing for 'bar'. If I remove 'foo' and 'joe' from the array and just search for 'bar', it'll find 'bar' fine. Does it need a rest between calls or something?!
Incidentally, my preferred approach would be to do something like:
find . -name "*.pdf*" -exec pdftotext {} - \; | grep -i -l "foo"
but I can't get this to work in Terminal. I've installed http://www.bluem.net/en/mac/packages/ (I struggle to compile things, so packages like this = thumbs up!), but every time I try and pipe this to grep (e.g. pdftotext myfile.pdf - | grep -i -l "foo"
) grep just returns (standard output)
and no more.