dongzhuo5425 2019-05-28 14:51
浏览 40

php XMLReader readOuterXML不适用于大数据

I have .xml 300mb+, 11kk lines. I need to parse each of sections like <Views>, <Filters> etc. Manyof this sections has similar columns like <Name> for parsering i'm checking is in current section and getting outer xml to parse all data in current section. it's single case which i`ve found to parse similar columns in different sections ex.

 if ($this->reader->nodeType === XMLReader::ELEMENT && $this->reader->localName === 'Views') {
                $file = $this->reader->readOuterXML();
                $columns = [
                    "setUid" => "UID",
                    "setGuid" => "GUID",
                    "setUuid" => "ID",
                    "setName" => "Name",
                ];/                
                $this->parseBlocks($import, 'BoardColumn', $columns, $file);
            }

Problems start in section <Assignments> because that section has 9kk lines and server can not execute $this->reader->readOuterXML() php settings

memory_limit = 4096M
max_execution_time = 7200
max_input_time = 7200
post_max_size = 4096M

nginx settings

        fastcgi_read_timeout 3600;
        proxy_connect_timeout       3600;
        proxy_send_timeout          3600;
        proxy_read_timeout          3600;
        send_timeout                3600;

structure of xml

<Project xmlns="http://schemas.microsoft.com/project">
    <Views>
        <View>
            <Name>Gantt &amp;with Timeline</Name>
            <IsCustomized>true</IsCustomized>
        </View>
        <View>
            <Name>&amp;Gantt Chart</Name>
            <IsCustomized>true</IsCustomized>
        </View>
    </Views>
    <Filters>
        <Filter>
            <Name>&amp;All Tasks</Name>
        </Filter>
        <Filter>
            <Name>&amp;All Resources</Name>
        </Filter>
    </Filters>
    <Groups>
        <Group>
            <Name>&amp;No Group</Name>
        </Group>
        <Group>
            <Name>&amp;No Group</Name>
        </Group>
    </Groups>
    <Tables>
        <Table>
            <Name>&amp;Entry</Name>
            <IsCustomized>true</IsCustomized>
        </Table>
    </Tables>
    <ExtendedAttributes>
        <ExtendedAttribute>
            <FieldID>188743731</FieldID>
            <FieldName>Tekst1</FieldName>
            <Guid>000039B7-8BBE-4CEB-82C4-FA8C0B400033</Guid>
            <SecondaryPID>255869028</SecondaryPID>
            <SecondaryGuid>000039B7-8BBE-4CEB-82C4-FA8C0F404064</SecondaryGuid>
        </ExtendedAttribute>
    </ExtendedAttributes>
    <Tasks>
        <Task>
            <UID>0</UID>
            <GUID>9AB1E99A-12FA-E811-9DD3-2016B93223A9</GUID>
            <ID>0</ID>
        </Task>
    </Tasks>
    <Resources>
        <Resource>
            <UID>0</UID>
            <GUID>A0CB8B7E-2A8C-436D-0000-0000000000FF</GUID>
            <ID>0</ID>
            <Type>1</Type>
            <IsNull>0</IsNull>
        </Resource>
    </Resources>


    <Assignments>   
        <Assignment>
            <UID>13394</UID>
            <GUID>B6C918FA-17FA-E811-9DD4-2016B93223A9</GUID>
            <TaskUID>22636</TaskUID>
            <TimephasedData>
                <Type>1</Type>
                <UID>13394</UID>
            </TimephasedData>
            <TimephasedData>
                <Type>1</Type>
                <UID>13394</UID>
            </TimephasedData>
        </Assignment>
    </Assignments>  

    <BoardColumns>
        <BoardColumn>
            <UID>1</UID>
            <GUID>43E41B9B-12FA-E811-9DD3-2016B93223A9</GUID>
            <ID>0</ID>
            <Name>Backlog</Name>
        </BoardColumn>
        <BoardColumn>
            <UID>2</UID>
            <GUID>44E41B9B-12FA-E811-9DD3-2016B93223A9</GUID>
            <ID>1</ID>
            <Name>Next up</Name>
        </BoardColumn>
    </BoardColumns>

    <Sprints>
        <Sprint>
            <UID>1</UID>
            <GUID>47E41B9B-12FA-E811-9DD3-2016B93223A9</GUID>
            <ID>0</ID>
            <Name>No Sprint</Name>
            <DurationUnits>39</DurationUnits>
            <Duration>PT0H0M0S</Duration>
        </Sprint>

    </Sprints>
</Project>
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 使用ESP8266连接阿里云出现问题
    • ¥15 被蓝屏搞吐了,有偿求帮解答,Ai回复直接拉黑
    • ¥15 BP神经网络控制倒立摆
    • ¥20 要这个数学建模编程的代码 并且能完整允许出来结果 完整的过程和数据的结果
    • ¥15 html5+css和javascript有人可以帮吗?图片要怎么插入代码里面啊
    • ¥30 Unity接入微信SDK 无法开启摄像头
    • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
    • ¥20 cad图纸,chx-3六轴码垛机器人
    • ¥15 移动摄像头专网需要解vlan
    • ¥20 access多表提取相同字段数据并合并