douzhanyan5015 2011-03-15 20:30
浏览 70
已采纳

通过FTP解析大型XML文件

I need to parse a large XML file (>1 GB) which is located on a FTP server. I have a FTP stream aquired by ftp_connect(). (I use this stream for other FTP-related actions)

I know XMLReader is preferred for large XML files, but it will only accept a URI. So I assume a stream wrapper will be required. And the only ftp-function I know of which will allow me to retrieve only a small part of the file is ftp_nb_fget() in combination with ftp_nb_continue().

However, I do not know how I should put all of this together to make sure that a minimum amount of memory is used.

  • 写回答

3条回答 默认 最新

  • douxin1884 2011-03-15 20:45
    关注

    It looks like you may need to build on top of the low-level XML parser bits.

    In particular, you can use xml_parse to process XML one chunk of the XML string at a time, after calling the various xml_set_* functions with callbacks to handle elements, character data, namespaces, entities, and so on. Those callbacks will be triggered whenever the parser detects that it has enough data to do so, which should mean that you can process the file as you read it in arbitrarily-sized chunks from the FTP site.


    Proof of concept using CLI and xml_set_default_handler, which will get called for everything that doesn't have a specific handler:

    php > $p = xml_parser_create('utf-8');
    php > xml_set_default_handler($p, function() { print_r(func_get_args()); });
    php > xml_parse($p, '<a');
    php > xml_parse($p, '>');
    php > xml_parse($p, 'Foo<b>Bar</b>Baz');
    Array
    (
        [0] => Resource id #3
        [1] => <a>
    )
    Array
    (
        [0] => Resource id #3
        [1] => Foo
    )
    Array
    (
        [0] => Resource id #3
        [1] => <b>
    )
    Array
    (
        [0] => Resource id #3
        [1] => Bar
    )
    Array
    (
        [0] => Resource id #3
        [1] => </b>
    )
    php > xml_parse($p, '</a>');
    Array
    (
        [0] => Resource id #3
        [1] => Baz
    )
    Array
    (
        [0] => Resource id #3
        [1] => </a>
    )
    php >
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 在获取boss直聘的聊天的时候只能获取到前40条聊天数据
  • ¥20 关于URL获取的参数,无法执行二选一查询
  • ¥15 液位控制,当液位超过高限时常开触点59闭合,直到液位低于低限时,断开
  • ¥15 marlin编译错误,如何解决?
  • ¥15 有偿四位数,节约算法和扫描算法
  • ¥15 VUE项目怎么运行,系统打不开
  • ¥50 pointpillars等目标检测算法怎么融合注意力机制
  • ¥20 Vs code Mac系统 PHP Debug调试环境配置
  • ¥60 大一项目课,微信小程序
  • ¥15 求视频摘要youtube和ovp数据集