duaeim2874 2009-12-11 16:52
浏览 21
已采纳

too long

I am trying to use MediaWiki's API to get articles in XML format and include them on my page. I created a simple code which basically gets the XML representation of an article using ?action=parse&page=Page_Name&format=xml requests. The code is following:

if($_GET["page"]=='') die("Page not specified (possibly direct call)");
$pagename = $_GET["page"];
$handle = @fopen("mediawiki/api.php?action=parse&page=".$pagename."&format=xml", "r");
if ($handle) {
        while (!feof($handle)) {
        $buffer = $buffer.fgets($handle);
        }       
    $buffer = html_entity_decode($buffer);
    /*
    echo $buffer;
    */
    $xml = simplexml_load_string($buffer);
    foreach($xml->parse->children() as $child){
        switch($child->getName()){
            case "text":
                echo $child->asXML()."<br/>";
                break;
            case "categories":
                echo "<h3>Categories this project is related to: </h3><br/>";
                foreach($child->children() as $grandChild){
                    echo $grandChild." | ";
                }
                break;
        }
    }
    fclose($handle);
}

Now the problem is that I'm getting very strange output. Any <a name="" href=""></a> becomes converted to <a name="" href=""/> which makes all following text to become a link (I guess since there is not closing tag </a>). This is observed both in Mozilla Firefox and Google Chrome. I'm suspecting $buffer = html_entity_decode($buffer); to cause this problem. Is there a parameter for html_entity_decode(); I should specify to avoid this? Is it caused by some other error or misuse of html_entity_decode(); in my code?

(To see the XML output of the Wiki's API, you can try http://en.wikipedia.org/w/api.php?action=parse&page=No_Such_Page&format=xml with different page parameters)

POSSIBLE SOLUTION: I didn't want to go to JSON, as Jordan suggested, so I came up with this solution. I simply moved html_entity_decode to the case "text": block. So now I have there echo html_entity_decode($child->asXML())."<br/>";. Do you think this is feasible enough?

  • 写回答

2条回答 默认 最新

  • dtxzwdl08169 2009-12-11 17:02
    关注

    The problem isn't with html_entity_decode(). The problem is that SimpleXML is treating the contents of the <text> element as XML instead of text. By default, SimpleXML compresses empty elements (<a></a> to <a />). One way to get around this is to import the SimpleXML object into a DOM object, and use the LIBXML_NOEMPTYTAG option when saving the output. The problem with this option is that any <br /> elements will be output as <br></br>.

    The simpler alternative is to use a different response format from the API. I would suggest using the json response format and use the json_decode() function to parse the response.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?