Trying to get good at php web scrapping. Doing some tests and I've nailed scraping/echoing that information from one site to another, but I'm unable to also include the original links in the source code, which is what I'd ideally like to do. Any thoughts on how to accomplish this with what I've got thurs far? (I'm very new to php btw).
this is the php code:
// news
$doc = new DOMDocument;
// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;
// Most HTML Developers are chimps and produce invalid markup...
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('https://www.usatoday.com/');
$xpath = new DOMXPath($doc);
$query = "//ul[@class='hfwmm-list hfwmm-4uphp-list hfwmm-light-list']";
$entries = $xpath->query($query);
foreach ($entries as $entry) {
echo trim($entry->textContent); // use `trim` to eliminate spaces
}
that code is spitting out this: NBA Cavs win record-breaking Game 4 behind Irving's 40 Entertain This Watch: 'Black Panther' trailer unleashes a fearsome king News Police: London Bridge terrorists planned more bloodshed How Trump is highlighting divisions amo..........
Now what I'd really like to do, is actually have those as working links, which was what it was in the original code. this is what the source code for this information looked like:
<div class="partner-heroflip-ad partner-placement ui-flip-panel size-xxs"><a
href="#" class="partner-close"></a></div></div><p class="hfwmm-tertiary-
list-title hfwmm-light-tertiary-list-title">TOP STORIES</p><ul class="hfwmm-
list hfwmm-4uphp-list hfwmm-light-list"
data-track-prefix="flex4uphphero"><li class="hfwmm-item hfwmm-secondary-item
hfwmm-item-2 sports-theme-bg hfwmm-first-secondary-item hfwmm-4uphp-
secondary-item"
data-asset-position="1"
data-asset-id="102694848"
><a class="js-asset-link hfwmm-list-link hfwmm-light-list-link hfwmm-image-
link hfwmm-secondary-link
href="/story/sports/nba/2017/06/10/kyrie-irving-lebron-james-cavs-win-game-
4/102694848/"
data-track-display-type="thumb"
data-ht="flex4uphpherostack1"
data-asset-id="102694848"
><span class="hfwmm-image-gradient hfwmm-secondary-image-gradient"></span>
<span class="js-asset-section theme-bg-ssts-label hfwmm-ssts-label-top-left
hfwmm-ssts-label-secondary sports-theme-bg">NBA</span><img
src="https://www.gannett-cdn.com/-
mm-/cd17823b265aa373c83094fc75525710f645ec90/c=0-178-4072-
81338209183-USP-NBA-FINALS-GOLDEN-STATE-WARRIORS-AT-CLEVELAND-91573076.JPG"
class="hfwmm-image hfwmm-secondary-image js-asset-image placeholder-hide"
alt="Kyrie Irving reacts after making a basket against the"
data-id="102695338"
data-crop="16_9"
width="239"
height="135" /><span class="hfwmm-secondary-hed-wrap hfwmm-secondary-text-
hed-wrap"><span class="hfwmm-text-hed-icon js-asset-disposable"></span><span
title="Cavs win record-breaking Game 4 behind Irving's 40"
class="js-asset-headline hfwmm-list-hed hfwmm-secondary-hed placeholder-
hide">
Cavs win record-breaking Game 4 behind Irving's 40
hfwmm-item-3 life-theme-bg hfwmm-4uphp-secondary-item"
data-asset-position="2"
For sanity, the href above is href="/story/sports/nba/2017/06/10/kyrie-irving-lebron-james-cavs-win-game- 4/102694848/"
Any thoughts on how this might be accomplished in this test scenario, would be hugely helpful. Thank you very much. -wilson