dongzhouhao4316 2016-08-08 02:11
浏览 39
已采纳

PhP从嵌入式JavaScript的HTML中提取字符串

I am trying to extract this data (MARK PATER) from the webpage and I want it to be a String and NOT an hyperlink. Here is my code:

When I echo this is the result that I get on my browser: MARK PATERÂ Â . I am not able to extract this value as a string...It's a HYPERLINK. When I open up the source I get this:

<a class="filter_list" href="" onclick="return fillFilterForm(document.formFilter1, 'nation_party_name', 'MARK PATGHL');"><font face="Verdana" size="1" color="BLACK">MARK PATERÂ Â </font></a>string(0) ""

Here is part of the source code from echo $html:

<tr >

<td align="justify" width="5%" nowrap><font face="Verdana" size="1">&nbsp;&nbsp;&nbsp;

*

<a class="list_2" href="details.asp

?doc_id=2&index=0&file_num=07">View</a>&nbsp;&nbsp;</font>

</td>

<td width="20%" align="justify" ><a class="filter_list" href="" onClick="return fillFilterForm(document.formFilter1, 'party_name', 'NEW YORK GORDI’);”><font face="Verdana" size="1" color="BLACK">NEW YORK GORDI&nbsp;&nbsp;</font></td>

<td width="15%" align="justify" nowrap><a class="filter_list" href="" onClick="return fillFilterForm(document.formFilter1, ’Name’, ‘MARK PATER );”><font face="Verdana" size="1" color="BLACK">MARK PATER&nbsp;&nbsp;</font></td>

Code:

$html = file_get_html($link);
//echo htmlspecialchars ($html);
// a new dom object
$dom = new domDocument;  
// load the html into the object
$dom->loadHTML($html); 
$tables = $dom->getElementsByTagName('td');
echo get_inner_html($tables->item(26));


function get_inner_html( $node ) 
{
$innerHTML= '';
$children = $node->childNodes;

foreach ($children as $child)
{
    $innerHTML .= $child->ownerDocument->saveXML( $child );
}

return $innerHTML;

}

enter code here
  • 写回答

1条回答 默认 最新

  • douxie4583 2016-08-08 03:12
    关注

    Try using regular expression

    Try building a Regular Expression to extract strings from HTML.

    Looping through HTML using SimpleXML / DOM sometimes is a very head-aching process.

    Sample for your case

    $html = "<tr >
    
    <td align=\"justify\" width=\"5%\" nowrap><font face=\"Verdana\" size=\"1\">&nbsp;&nbsp;&nbsp;
    
    *
    
    <a class=\"list_2\" href=\"details.asp?doc_id=2&index=0&file_num=07\">View</a>&nbsp;&nbsp;</font>
    
    </td>
    
    <td width=\"20%\" align=\"justify\" ><a class=\"filter_list\" href=\"\" onClick=\"return fillFilterForm(document.formFilter1, 'party_name', 'NEW YORK GORDI';);\"><font face=\"Verdana\" size=\"1\" color=\"BLACK\">NEW YORK GORDI&nbsp;&nbsp;</font></td>
    
    <td width=\"15%\" align=\"justify\" nowrap><a class=\"filter_list\" href=\"\" onClick=\"return fillFilterForm(document.formFilter1, 'Name', 'MARK PATER';);\"><font face=\"Verdana\" size=\"1\" color=\"BLACK\">MARK PATER&nbsp;&nbsp;</font></td>";
    
    preg_match_all('/(?:<td.+><a.+><font.+>)([\w\s]+)(?:(&nbsp;)+<\/font><\/td>)/', $html, $filtered);
    
    print_r( $filtered[1] );
    
    //Output: Array ( [0] => NEW YORK GORDI [1] => MARK PATER )
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 R语言卸载之后无法重装,显示电脑存在下载某些较大二进制文件行为,怎么办
  • ¥15 java 的protected权限 ,问题在注释里