duanjiani6826 2014-04-09 20:01
浏览 56
已采纳

使用PHP处理大型XML文件

I have 2 large XML files which contains product details of a webshop. The first contains the product codes, names, and informations about the product avaibilities in stock, the second contains the product codes too, furthermore the names, the prices and other details of the products. I have to create a list of the products avaible in stock with all details, outputted to a (html) table.

My problem is the following: in the XML files are about 13000 products. With the first step (outputting the codes of the avaible products) i haven't problems, but when i try output the data from the second xml too, it doesn't works, the browser always shows "no data received". It's logical, there are about 2000-3000 products avaible in stock, what means, that the second XML file should be readed through 2000-3000 times.

How can i solve this problem? I can edit only the second XML file, the first is loaded from an external source, where i doesn't have access. Should I import the second XML file to an SQL table, or that isn't a good idea neither? Then what should i do?

Thanks (and sorry for the little bad english)!

My PHP code:

<?php

$zasoby_xml = file_get_contents('zasoby.xml');

$sxe0 = new SimpleXMLElement($zasoby_xml);
$sxe0->registerXPathNamespace('lStk', 'http://www.stormware.cz/schema/version_2/list_stock.xsd');
$lStkStock = $sxe0->xpath('//lStk:stock');
$cnt = count($lStkStock);

$sxe = new SimpleXMLElement($zasoby_xml);
$sxe->registerXPathNamespace('stk', 'http://www.stormware.cz/schema/version_2/stock.xsd');
$stkCode = $sxe->xpath('//stk:code'); //product code
$stkName = $sxe->xpath('//stk:name'); //product name
$stkCount = $sxe->xpath('//stk:count'); //count in the stock

$db_xml = simplexml_load_file('db.xml');

for ($i = 0;$i < $cnt;$i++) {
    if ($stkCount[$i] > 0) {
        echo $stkCode[$i]."&nbsp;&nbsp;";
        $j = 0;
        while($stkCode[$i] != $db_xml->record[$j]->product_id) {
            $j++;
        }
        echo $db_xml->record[$j]->category_path."<br>";
    }
}
?>

First XML file example:

<?xml version="1.0" encoding="Windows-1250"?>
<rsp:responsePack version="2.0" id="Usr01" state="ok" note="46895680" programVersion="10608.3 E1 (13.3.2014)" xmlns:rsp="http://www.stormware.cz/schema/version_2/response.xsd" xmlns:lStk="http://www.stormware.cz/schema/version_2/list_stock.xsd" xmlns:stk="http://www.stormware.cz/schema/version_2/stock.xsd">
<rsp:responsePackItem version="2.0" id="Usr01" state="ok">
<lStk:listStock version="2.0" dateTimeStamp="2014-04-08T14:18:14" dateValidFrom="2014-04-08" state="ok">
<lStk:stock version="2.0">
    <stk:code>90000000</stk:code>
    <stk:count>975.0</stk:count>
    <stk:name>Product name</stk:name>
</lStk:stock>
</lStk:listStock></rsp:responsePackItem></rsp:responsePack>

Second XML file example:

<?xml version="1.0" encoding="utf-8" ?>
<data>
<record>
    <product_id><![CDATA[77778888]]></product_id>
    <name><![CDATA[productname]]></name>
    <Deeplink><![CDATA[product url]]></Deeplink>
    <Img_url><![CDATA[product img_url]]></Img_url>
    <category_path><![CDATA[product category]]></category_path>
    <Price><![CDATA[product price]]></Price>
</record>
</data>
  • 写回答

1条回答 默认 最新

  • douti9253 2014-04-10 12:53
    关注

    Using a while loop to go through the entire $db_xml document each time you need to search for a product is inefficient. Importing the second XML file to an SQL table is not a bad idea, but it seems a bit annoying when you can actually use a PHP array indexed by product_id.

    I've prepared some code to illustrate my point:

    <?php
    
    $zasoby_xml = file_get_contents('zasoby.xml');
    
    $sxe0 = new SimpleXMLElement($zasoby_xml);
    $sxe0->registerXPathNamespace('lStk', 'http://www.stormware.cz/schema/version_2/list_stock.xsd');
    $lStkStock = $sxe0->xpath('//lStk:stock');
    $cnt = count($lStkStock);
    
    $sxe = new SimpleXMLElement($zasoby_xml);
    $sxe->registerXPathNamespace('stk', 'http://www.stormware.cz/schema/version_2/stock.xsd');
    $stkCode = $sxe->xpath('//stk:code'); // product code
    $stkName = $sxe->xpath('//stk:name'); // product name
    $stkCount = $sxe->xpath('//stk:count'); // count in the stock
    
    $db_xml = simplexml_load_file('db.xml');
    
    // Loop through record elements on db.xml to build an array that can be accessed by product_id
    
    $records = array();
    
    foreach ($db_xml->record as $record) {
        $records[(string)$record->product_id] = $record;
    }
    
    // Loop through all products to display their information
    
    for ($i = 0; $i < $cnt; $i++) {
    
        // Display only products in stock
    
        if ($stkCount[$i] > 0) {
    
            // Access this record directly by product_id (code) instead of looping through all records in db.xml
    
            if (isset($records[(string)$stkCode[$i]])) {
                echo sprintf(
                    "<b>Code</b> %s <b>Category</b> %s", 
                    $stkCode[$i], $records[(string)$stkCode[$i]]->category_path
                );
            }
        }
    }
    
    ?>
    

    zasoby.xml

    <?xml version="1.0" encoding="Windows-1250"?>
    <rsp:responsePack version="2.0" id="Usr01" state="ok" note="46895680" programVersion="10608.3 E1 (13.3.2014)" xmlns:rsp="http://www.stormware.cz/schema/version_2/response.xsd" xmlns:lStk="http://www.stormware.cz/schema/version_2/list_stock.xsd" xmlns:stk="http://www.stormware.cz/schema/version_2/stock.xsd">
    <rsp:responsePackItem version="2.0" id="Usr01" state="ok">
    <lStk:listStock version="2.0" dateTimeStamp="2014-04-08T14:18:14" dateValidFrom="2014-04-08" state="ok">
    <lStk:stock version="2.0">
        <stk:code>90000000</stk:code>
        <stk:count>975.0</stk:count>
        <stk:name>Product name</stk:name>
    </lStk:stock>
    </lStk:listStock></rsp:responsePackItem></rsp:responsePack>
    

    db.xml

    <?xml version="1.0" encoding="utf-8" ?>
    <data>
    <record>
        <product_id><![CDATA[90000000]]></product_id>
        <name><![CDATA[productname]]></name>
        <Deeplink><![CDATA[product url]]></Deeplink>
        <Img_url><![CDATA[product img_url]]></Img_url>
        <category_path><![CDATA[product category]]></category_path>
        <Price><![CDATA[product price]]></Price>
    </record>
    </data>
    

    With these XML files I'm getting the following output:

    Code 90000000 Category product category
    

    A problem with this implementation is the memory consumption of the $records array. If the second XML file gets too big you are going to end up with an array of thousands of elements. If this problem arises you could solve it by building an SQLite database file on disk instead of an array, or maybe not storing the full SimpleXMLElement $record object in the array under each product_id key.

    EDIT: Fixed an error in line 23 of the script.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog
  • ¥15 Excel发现不可读取的内容
  • ¥15 关于#stm32#的问题:CANOpen的PDO同步传输问题
  • ¥20 yolov5自定义Prune报错,如何解决?