douyanjing0822 2017-06-26 15:19
浏览 92

如何在PHP中序列化和反序列化大文件? [关闭]

I have a HUGE complex data structure (TRIE) that I need to store for later use.

So, I'm using serialize/unserialize (please suggest any better method if any):

$fp = fopen("serialized_trie.txt","w+");
fwrite($fp,serialize($root));
fclose($fp);

$root = unserialize(file_get_contents("serialized_trie.txt"));

The trie itself is made from 1 million words. So its a big trie.

I need to somehow store this trie. Writing such a big trie to file takes huge amount of time. And file_get_contents in unserialize would cause entire file to be loaded in memory.

Do I need to use a binary file instead of txt file? How?

Also I've read about 3 techniques to store: serialize, json_encode, var_export

Do I need to use json_encode or var_export method in this case?

How do I QUICKLY store the trie and retrieve it?

  • 写回答

1条回答 默认 最新

  • douzhe9075 2017-06-26 16:28
    关注

    You didn't specify what the actual file size is. With that said, the serialize function basically turns the variable into an intermediary text form that can be safely written to disk, but it's not at all optimized.

    You could try compressing the file before it's written:

    $fp = fopen("serialized_trie.gzd","w+");
    //gzdeflate supports 0-9 levels of compression
    //You might want to experiment
    fwrite($fp, gzdeflate(serialize($root), 5));
    fclose($fp);
    

    To read in:

    $root = unserialize(gzinflate(file_get_contents("serialized_trie.gzd")));
    

    The extension is not important, as there is no standard for raw deflate files, but I'd suggest something other than .txt to indicate this is not an actual text file.

    In regards to memory use, this is highly dependent on the size of your trie structure, which you have already indicated is large, but without any specifics.

    As per my answer to your other question, this is going to be many times slower and than reading the variable from an in-memory cache.

    Serialize is built to serialize one or more php variables and re-read those variables off disk. It is used for php session support.

    json

    json_encode is useful if you need to return data for use in a client that needs or supports javascript compatible variables.

    var_export

    var_export has some issues with complex data structures. With that said, it is possible to use var_export to write out the trie structure as a php script which could then be require_once(). This might be more performant than these other options.

    $fp = fopen("trie.php","w+");
    fwrite($fp, '<?php $root = ' . var_export($root) . '; ?>');
    fclose($fp);
    

    To read back in:

    require_once('trie.php');
    

    Obviously your script needs to place trie.php in a location under the webroot that is read/writeable, but that's a whole other discussion. Like any other include() you need the path to the script.

    评论

报告相同问题?

悬赏问题

  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP