dsfjk44656 2016-06-10 11:09
浏览 48
已采纳

从html字符串加载JSON数据

Using cURL, I am navigating to a webpage. With the response from the cURL script, I essentially do the following

$dom = new DOMDocument();
$dom->loadHTML($response);

If I output $dom as expected I can see all of the html code for that page. Within the code, there is one specific section which is like the following

<script id="data" type="application/json">
<![CDATA[
{
    sortColumn: "QuoteNumber",
    quotes: {
        "Data":
        [
            {
                "ID":3235720,
                "Date":"20 May 2016",
                "QuoteNumber":"Q12415",
                "Name":"Some Name",
                "Client":"Some Client",
                "StateName":"Issued",
                "Url":"/Quote/View/3235720"
            }
        ]
    }
}
]]>
</script>

Is there any way I can target just this specific block of code? I essentially need to load the JSON and obtain the ID for the Quote. Would this be possible?

  • 写回答

1条回答 默认 最新

  • dtotuki47568 2016-06-10 14:43
    关注
    1. You can get the <script> tag using getElementById("data")
    2. Check the CDATA Node by comparing with the constant XML_CDATA_SECTION_NODE.
    3. Use str_replace() to remove the CDATA tag.
    4. Use json_decode to parse your content to JSON.

    By the way, the content inside your CDATA is actually a malformed JSON. It should be corrected as described below:

    <![CDATA[
    {
        "sortColumn" : "QuoteNumber",
        "quotes": {
            "Data":
            [
                {
                    "ID":3235720,
                    "Date":"20 May 2016",
                    "QuoteNumber":"Q12415",
                    "Name":"Some Name",
                    "Client":"Some Client",
                    "StateName":"Issued",
                    "Url":"/Quote/View/3235720"
                }
            ]
        }
    }
    ]]>
    

    I have also added has_json_error() function at the bottom so that you could see some error messages.

    $dom = new DOMDocument();
    $dom->loadHTML($response);
    $data = $dom->getElementById("data");
    $content = ''; 
    foreach ($data->childNodes as $child) { 
        if ($child->nodeType == XML_CDATA_SECTION_NODE) {
            $content = $child->textContent;
        }
    }
    $content = str_replace(array("<![CDATA[", "]]>"), '', $content);
    $jsons = json_decode($content);
    
    if(!has_json_error()) {
        echo $jsons->sortColumn;
        echo "<br /><br />";
        print_r($jsons->quotes);
        echo "<br /><br />";
        $data = $jsons->quotes->Data;
        foreach($data as $obj) {
            echo $obj->ID . "<br />";
            echo $obj->Date . "<br />";
            echo $obj->QuoteNumber . "<br />";
            echo $obj->Name . "<br />";
            echo $obj->Client . "<br />";
            echo $obj->StateName . "<br />";
            echo $obj->Url . "<br />";
        }
    }
    
    function has_json_error() {
        if (function_exists ( 'json_last_error' ) && json_last_error() !== JSON_ERROR_NONE) {
            switch (json_last_error()) {
                case JSON_ERROR_DEPTH:
                    echo 'JSON_ERROR: - Maximum stack depth exceeded';
                break;
                case JSON_ERROR_STATE_MISMATCH:
                    echo 'JSON_ERROR: - Underflow or the modes mismatch';
                break;
                case JSON_ERROR_CTRL_CHAR:
                    echo 'JSON_ERROR: - Unexpected control character found';
                break;
                case JSON_ERROR_SYNTAX:
                    echo 'JSON_ERROR: - Syntax error, malformed JSON';
                break;
                case JSON_ERROR_UTF8:
                    echo 'JSON_ERROR: - Malformed UTF-8 characters, possibly incorrectly encoded';
                break;
                default:
                    echo 'JSON_ERROR: - Unknown error: ' . json_last_error();
                break;
            }           
            return true;
        }
        else if (function_exists ( 'json_last_error_msg' ) && json_last_error_msg () !== "No error") {
            echo ("json_last_error_msg, JSON_ERROR:" . json_last_error_msg ());
            return true;
        }
        return false;
    }
    

    The result from the snippet above would be something like below:

    QuoteNumber
    
    stdClass Object ( 
        [Data] => Array ( 
            [0] => stdClass Object ( 
                [ID] => 3235720 
                [Date] => 20 May 2016 
                [QuoteNumber] => Q12415 
                [Name] => Some Name 
                [Client] => Some Client 
                [StateName] => Issued 
                [Url] => /Quote/View/3235720 
            ) 
        ) 
    ) 
    
    3235720
    20 May 2016
    Q12415
    Some Name
    Some Client
    Issued
    /Quote/View/3235720
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么