For learning purposes, I'm trying to fetch data from the Steam Store, where if the image game_header_image_full
exists, I've reached a game. Both alternatives are sort of working, but there's a catch. One is really slow, and the other seems to miss some data and therefore not writing the URL's to a text file.
For some reason, Simple HTML DOM managed to catch 9 URL's, whilst the 2nd one (cURL) only caught 8 URL's with preg_match.
Question 1.
Is $reg
formatted in a way that $html->find('img.game_header_image_full')
would catch, but not my preg_match
? Or is the problem something else?
Question 2.
Am I doing things correctly here? Planning to go for the cURL alternative, but can I make it faster somehow?
Simple HTML DOM Parser (Time to search 100 ids: 1 min, 39s. Returned: 9 URL.)
<?php
include('simple_html_dom.php');
$i = 0;
$times_to_run = 100;
set_time_limit(0);
while ($i++ < $times_to_run) {
// Find target image
$url = "http://store.steampowered.com/app/".$i;
$html = file_get_html($url);
$element = $html->find('img.game_header_image_full');
if($i == $times_to_run) {
echo "Success!";
}
foreach($element as $key => $value){
// Check if image was found
if (strpos($value,'img') == false) {
// Do nothing, repeat loop with $i++;
} else {
// Add (don't overwrite) to file steam.txt
file_put_contents('steam.txt', $url.PHP_EOL , FILE_APPEND);
}
}
}
?>
vs. the cURL alternative.. (Time to search 100 ids: 34s. Returned: 8 URL.)
<?php
$i = 0;
$times_to_run = 100;
set_time_limit(0);
while ($i++ < $times_to_run) {
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, 'http://store.steampowered.com/app/'.$i);
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($ch);
$url = "http://store.steampowered.com/app/".$i;
$reg = "/<\\s*img\\s+[^>]*class=['\"][^'\"]*game_header_image_full[^'\"]*['\"]/i";
if(preg_match($reg, $content)) {
file_put_contents('steam.txt', $url.PHP_EOL , FILE_APPEND);
}
}
?>