duanjiani6826 2014-04-09 20:01
浏览 56
已采纳

使用PHP处理大型XML文件

I have 2 large XML files which contains product details of a webshop. The first contains the product codes, names, and informations about the product avaibilities in stock, the second contains the product codes too, furthermore the names, the prices and other details of the products. I have to create a list of the products avaible in stock with all details, outputted to a (html) table.

My problem is the following: in the XML files are about 13000 products. With the first step (outputting the codes of the avaible products) i haven't problems, but when i try output the data from the second xml too, it doesn't works, the browser always shows "no data received". It's logical, there are about 2000-3000 products avaible in stock, what means, that the second XML file should be readed through 2000-3000 times.

How can i solve this problem? I can edit only the second XML file, the first is loaded from an external source, where i doesn't have access. Should I import the second XML file to an SQL table, or that isn't a good idea neither? Then what should i do?

Thanks (and sorry for the little bad english)!

My PHP code:

<?php

$zasoby_xml = file_get_contents('zasoby.xml');

$sxe0 = new SimpleXMLElement($zasoby_xml);
$sxe0->registerXPathNamespace('lStk', 'http://www.stormware.cz/schema/version_2/list_stock.xsd');
$lStkStock = $sxe0->xpath('//lStk:stock');
$cnt = count($lStkStock);

$sxe = new SimpleXMLElement($zasoby_xml);
$sxe->registerXPathNamespace('stk', 'http://www.stormware.cz/schema/version_2/stock.xsd');
$stkCode = $sxe->xpath('//stk:code'); //product code
$stkName = $sxe->xpath('//stk:name'); //product name
$stkCount = $sxe->xpath('//stk:count'); //count in the stock

$db_xml = simplexml_load_file('db.xml');

for ($i = 0;$i < $cnt;$i++) {
    if ($stkCount[$i] > 0) {
        echo $stkCode[$i]."&nbsp;&nbsp;";
        $j = 0;
        while($stkCode[$i] != $db_xml->record[$j]->product_id) {
            $j++;
        }
        echo $db_xml->record[$j]->category_path."<br>";
    }
}
?>

First XML file example:

<?xml version="1.0" encoding="Windows-1250"?>
<rsp:responsePack version="2.0" id="Usr01" state="ok" note="46895680" programVersion="10608.3 E1 (13.3.2014)" xmlns:rsp="http://www.stormware.cz/schema/version_2/response.xsd" xmlns:lStk="http://www.stormware.cz/schema/version_2/list_stock.xsd" xmlns:stk="http://www.stormware.cz/schema/version_2/stock.xsd">
<rsp:responsePackItem version="2.0" id="Usr01" state="ok">
<lStk:listStock version="2.0" dateTimeStamp="2014-04-08T14:18:14" dateValidFrom="2014-04-08" state="ok">
<lStk:stock version="2.0">
    <stk:code>90000000</stk:code>
    <stk:count>975.0</stk:count>
    <stk:name>Product name</stk:name>
</lStk:stock>
</lStk:listStock></rsp:responsePackItem></rsp:responsePack>

Second XML file example:

<?xml version="1.0" encoding="utf-8" ?>
<data>
<record>
    <product_id><![CDATA[77778888]]></product_id>
    <name><![CDATA[productname]]></name>
    <Deeplink><![CDATA[product url]]></Deeplink>
    <Img_url><![CDATA[product img_url]]></Img_url>
    <category_path><![CDATA[product category]]></category_path>
    <Price><![CDATA[product price]]></Price>
</record>
</data>
  • 写回答

1条回答 默认 最新

  • douti9253 2014-04-10 12:53
    关注

    Using a while loop to go through the entire $db_xml document each time you need to search for a product is inefficient. Importing the second XML file to an SQL table is not a bad idea, but it seems a bit annoying when you can actually use a PHP array indexed by product_id.

    I've prepared some code to illustrate my point:

    <?php
    
    $zasoby_xml = file_get_contents('zasoby.xml');
    
    $sxe0 = new SimpleXMLElement($zasoby_xml);
    $sxe0->registerXPathNamespace('lStk', 'http://www.stormware.cz/schema/version_2/list_stock.xsd');
    $lStkStock = $sxe0->xpath('//lStk:stock');
    $cnt = count($lStkStock);
    
    $sxe = new SimpleXMLElement($zasoby_xml);
    $sxe->registerXPathNamespace('stk', 'http://www.stormware.cz/schema/version_2/stock.xsd');
    $stkCode = $sxe->xpath('//stk:code'); // product code
    $stkName = $sxe->xpath('//stk:name'); // product name
    $stkCount = $sxe->xpath('//stk:count'); // count in the stock
    
    $db_xml = simplexml_load_file('db.xml');
    
    // Loop through record elements on db.xml to build an array that can be accessed by product_id
    
    $records = array();
    
    foreach ($db_xml->record as $record) {
        $records[(string)$record->product_id] = $record;
    }
    
    // Loop through all products to display their information
    
    for ($i = 0; $i < $cnt; $i++) {
    
        // Display only products in stock
    
        if ($stkCount[$i] > 0) {
    
            // Access this record directly by product_id (code) instead of looping through all records in db.xml
    
            if (isset($records[(string)$stkCode[$i]])) {
                echo sprintf(
                    "<b>Code</b> %s <b>Category</b> %s", 
                    $stkCode[$i], $records[(string)$stkCode[$i]]->category_path
                );
            }
        }
    }
    
    ?>
    

    zasoby.xml

    <?xml version="1.0" encoding="Windows-1250"?>
    <rsp:responsePack version="2.0" id="Usr01" state="ok" note="46895680" programVersion="10608.3 E1 (13.3.2014)" xmlns:rsp="http://www.stormware.cz/schema/version_2/response.xsd" xmlns:lStk="http://www.stormware.cz/schema/version_2/list_stock.xsd" xmlns:stk="http://www.stormware.cz/schema/version_2/stock.xsd">
    <rsp:responsePackItem version="2.0" id="Usr01" state="ok">
    <lStk:listStock version="2.0" dateTimeStamp="2014-04-08T14:18:14" dateValidFrom="2014-04-08" state="ok">
    <lStk:stock version="2.0">
        <stk:code>90000000</stk:code>
        <stk:count>975.0</stk:count>
        <stk:name>Product name</stk:name>
    </lStk:stock>
    </lStk:listStock></rsp:responsePackItem></rsp:responsePack>
    

    db.xml

    <?xml version="1.0" encoding="utf-8" ?>
    <data>
    <record>
        <product_id><![CDATA[90000000]]></product_id>
        <name><![CDATA[productname]]></name>
        <Deeplink><![CDATA[product url]]></Deeplink>
        <Img_url><![CDATA[product img_url]]></Img_url>
        <category_path><![CDATA[product category]]></category_path>
        <Price><![CDATA[product price]]></Price>
    </record>
    </data>
    

    With these XML files I'm getting the following output:

    Code 90000000 Category product category
    

    A problem with this implementation is the memory consumption of the $records array. If the second XML file gets too big you are going to end up with an array of thousands of elements. If this problem arises you could solve it by building an SQLite database file on disk instead of an array, or maybe not storing the full SimpleXMLElement $record object in the array under each product_id key.

    EDIT: Fixed an error in line 23 of the script.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么