I am working with cURL and once executed I end up with a variable $data
which contains the whole pages html content.
I have a little bit of the html content content within that variable to demonstrate.
<table class="TabTopGroup" width="100%" height="100%" cellspacing="0" cellpadding="0" border="0">
<tr>
<td align="Left" class="HtmlGridCell" colspan="5"> </td>
<td align="Left" class="HtmlGridCell" colspan="2"><span class="progress" title="0%"><span class="indicator" style="width: 0%"> </span></span></td>
</tr>
<tr valign="top">
<td align="Left" class="HtmlGridCell no-bottom-border"><span class="priority-1"> </span></td>
<td align="Left" class="HtmlGridCell no-bottom-border"><a href="jobview.aspx?id=12514845" class="link">J000005</a></td>
<td align="Left" class="HtmlGridCell no-bottom-border">Student</td>
<td align="Left" class="HtmlGridCell no-bottom-border">test job</td>
<td align="Left" class="HtmlGridCell no-bottom-border"><span id='jobstate_12514845'>Planned</span><span class='inline-dropdown' onclick='return jqe.c();' onmouseover='jqe.s(12514845, this, event);'> </span></td>
<td align="Left" class="HtmlGridCell no-bottom-border">02-Jun</td>
<td align="Left" class="HtmlGridCell no-bottom-border">02-Jun</td>
</tr>
<tr>
<td align="Left" class="HtmlGridCell" colspan="5"> </td>
<td align="Left" class="HtmlGridCell" colspan="2"><span class="progress" title="0%"><span class="indicator" style="width: 0%"> </span></span></td>
</tr>
<tr valign="top">
<td align="Left" class="HtmlGridCell no-bottom-border"><span class="priority-1"> </span></td>
<td align="Left" class="HtmlGridCell no-bottom-border"><a href="jobview.aspx?id=12514850" class="link">J000006</a></td>
<td align="Left" class="HtmlGridCell no-bottom-border">Student</td>
<td align="Left" class="HtmlGridCell no-bottom-border">test job</td>
<td align="Left" class="HtmlGridCell no-bottom-border"><span id='jobstate_12514850'>Planned</span><span class='inline-dropdown' onclick='return jqe.c();' onmouseover='jqe.s(12514850, this, event);'> </span></td>
<td align="Left" class="HtmlGridCell no-bottom-border">02-Jun</td>
<td align="Left" class="HtmlGridCell no-bottom-border">02-Jun</td>
</tr>
</table>
Now on the other side of things, I have an array which contains the following type of data
$jobs =
array(
array(
jID => "J000005",
Name => "Something"
),
array(
jID => "J000006",
Name => "Something"
),
array(
jID => "J16453",
Name => "Something"
)
);
Now what I am trying to do is search for occurrences of the jID within the html string. If a jID is found, I need to obtain the id parameter from its parents anchor and then add them to an array. So if I cross check the above array with the HTML, I should end up with something like this.
$outcome =
array(
array(
jID => "J000005",
aID => "12514845"
),
array(
jID => "J000006",
aID => "12514850"
)
);
The example I have shown above is a small dataset. The html string has a lot more data, and my initial array will contain about 50 jID's.
Really I am after advice as to the best way to handle this. I was initially thinking of using DomDocument but I dont think this is the best way. Another option would be to use preg_match_all somehow but I am not too sure how efficient this would be.
Another problem I am faced with is that the html might contain more than one occurence of the jID. I am not bothered how many occurences of J000005 there are for instance, all I want is it's associative id which is contained as a parameter within its parent anchor.
So any advice on how this can be achieved appreciated. I would be interested to understand what the most efficient way is because I read the preg_match_all is faster than doing it via DomDocument.