douni1396 2014-01-31 19:39
浏览 32
已采纳

简单的HTML DOM返回NULL

I'm scraping data from a website using Simple HTML DOM parser (http://simplehtmldom.sourceforge.net/)

The HTML is:

<tr class="productListing-odd">
    <td align="right" class="productListing-data">&nbsp;0&nbsp;</td>
    <td class="productListing-data">&nbsp;<a href="http://www.spellvault.net/p46563/Liliana-of-the-Veil/product_info.html" onmouseout="hd()" onmouseover="sd('images/101257121.jpg')">Liliana of the Veil</a>&nbsp;<br>    </td>
    <td align="center" class="productListing-data">&nbsp;Black&nbsp;</td>
    <td align="center" class="productListing-data">&nbsp;Mythic&nbsp;</td>
    <td align="center" class="productListing-data">&nbsp;Innistrad&nbsp;</td>
    <td align="right" class="productListing-data">€42,50&nbsp;</td>
    <td align="center" class="productListing-data"><input type="text" name="var[46563]" value="" size="4">&nbsp;    <span class="nowrap"><span class="template-button-left">&nbsp;</span><span class="template-button-middle"><input class="submitButton" type="submit" value="Bestel"></span><span class="template-button-right">&nbsp;</span></span>&nbsp;</td>
  </tr>

And the php:

include_once('simple_html_dom.php');

$html = file_get_html('-the url of the search query on the website-');

$array = array();
foreach($html->find('.productListing-odd, .productListing-even') as $element) {
    $row = array(
        'name' => strip_tags($element->childNodes(1)->innertext),
        'set' => strip_tags($element->childNodes(4)->innertext),
        'price' => strip_tags($element->childNodes(5)->innertext),
        'stock' => strip_tags($element->childNodes(0)->innertext)
    );
    array_push($array, $row);
}
echo json_encode($array);

For some reason, the value of 'price' keeps returning NULL. All the other values are collected properly. I can't figure out why this is happening, since the elements all seem to have the same structure.

Thanks in advance!

  • 写回答

1条回答 默认 最新

  • dsxjot8620 2014-01-31 20:29
    关注

    Most likely that HTML you parsed has non-unicode charset. And this is a problem since json_encode() works only with UTF-8 encoding. Almost all the data you parsed has ASCII characters so it doesn't lead to any problem. But price data (6th column) contains non-ASCII character '€' on which json_encode() fails (and return null).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法