You assume that the page contains <class="order"
. But it doesn't; what it does contain is
<div class="zone-dedicated-availability button"
data-actions="orderButton"
data-ref="142sys5"
data-cgi="order"></div>
You possibly need a more powerful tool than strpos
(no, not regexps).
If you really are sure the structure of the page/CSS is not going to change too much, you can try to extract all "" tags (recognizable with an easy and reasonable regexp: "]+>"), and then check all of them until you find one that contains "orderButton" or something like that. preg_match_all()
and array_filter()
are probably your friends.
Another very promising possibility is to use a XML library - the URL extension seems to indicate it's possible to access a reasonably structured and well-formed entity tree behind that page. If so, XPath is your friend.
Update
The XML you indicated is not very well formed (it has the non-HTML tags header
, footer
, and nav
; and it has the Italian flag erroneously declared as Flagz/fi instead of Flagz/it, colliding with the Finland flag. Which says the file was not validated and therefore cannot be trusted to work reliably), so
simplexml_load_file($address)
->xpath('/div[class="button"][data-actions="orderButton"]');
or something like that (e.g. DOMdocument/DOMXpath), while the correct approach, is nonetheless not going to work off-the-shelf. A more permissive XML library is needed; you can try SimpleDOM.
The DOM approach is usually much better because it's extremely more flexible and does not need awkward 'fixes' to manage things such as the attributes changing their order. Also, several tools collaborate with DOM - for example with Firefox's Firebug extension you can simply grab the XPath off the object. They change their page layout, and instead of guessing how to extract the data you need, you can just open up the page, copy and paste the new XPath, and Bob's your uncle.
Otherwise, the brute force solution described above:
$xml = file_get_contents($url);
// Extract all DIVs with a `class` attribute (maybe `data-actions` would be better?)
preg_match_all('#<div[^>]+class[^>]+>#', $xml, $gregs);
// Accept only those with the appropriate data action
$btns = array_values(
array_filter(
$gregs[0],
function($div) {
return preg_match('#data-actions="orderButton"#', $div);
}
)
);
print_r($btns);
will return (unless $btns
is empty, of course)
Array
(
[0] => <div class="zone-dedicated-availability button" data-actions="orderButton" data-ref="142sys5" data-cgi="order">
)
You can then parse it (with XML too - just add '</div>
') to access the attributes such as data-ref
:
if (count($btns) != 1) {
die("No button, or too many buttons");
}
$xml = simplexml_load_string($btns[0] . '</div>');
$attrs = array();
foreach ($xml->attributes() as $key => $value) {
$attrs[$key] = (string)$value;
}
$ref = $attrs['data-ref'];
print $ref;
This will assign to $ref
the value '142sys5'. You can var_dump
the $attrs
array and see the other attributes, if needed.