I am having a difficult situation here that I am trying to debug.
I have a scrapper script, which has a file locking to prevent multiple instances, and also have a block on non cli executions.
I have set it in cron to run every minute. The scrapper does some database functions, and outputs statuses to 2 text files, which I monitor using jquery page so I see things live.
So I can see if the script is idle or running, I can see the last worked string, and details.I also do a check in the database to see the pending no. of sites that scrapper should handle.
The situation now is, I see the pending no. of sites reducing, but the check says script is idle, and there is nothing fed into the 2 text files.
When I try ps aux | grep php
I see the 2 lines.
root 3159 0.0 0.0 4392 664 ? Ss 17:57 0:00 /bin/sh -c php /var/www/html/workers/scraper.php
root 3160 0.0 1.5 298100 15824 ? S 17:57 0:00 php /var/www/html/workers/scraper.php
And when I try to run it myself, using php scraper.php it runs ( It shouldnt be, because the file lock should now fail), and I get it.
root 3159 0.0 0.0 4392 664 ? Ss 17:57 0:00 /bin/sh -c php /var/www/html/workers/scraper.php
root 3160 0.0 1.5 298100 15824 ? S 17:57 0:00 php /var/www/html/workers/scraper.php
root 3295 0.0 1.4 297852 15208 pts/2 S+ 18:11 0:00 php scraper.php
Anyway idea whats going on? I have been struggling to find whats wrong. In short, the cron actually does the work and do the scrapping, but it doesnt care about filelocking, or outputting statuses to text files, which looks so weird.
Here is the code :
<?php
error_reporting(E_ALL);
if(PHP_SAPI !== 'cli' || isset($_SERVER['HTTP_USER_AGENT'])) {
exit('cli only');
}
//get the site lists.
require_once("writedata.php");
require_once("dbconnect.php");
//set file lock to get script status
$file = fopen("lock.txt","w+");
// exclusive lock
if (flock($file, LOCK_EX | LOCK_NB))
{
// Grab lock. Continue work.
echo "Started Running";
//ALL THE MAIN WORK HAPPENS HERE.
$sql = "SELECT site FROM `sites` WHERE status=0 LIMIT 2500";
$result = $db->query($sql);
while($row = $result->fetch_assoc()){
$site=$row['site'];
file_put_contents("errors.txt", $site);
//Scrapper functions and parsing functions removed.
}
}
else{
echo "Script already running";
exit;
}