doutangu4671 2017-03-26 22:19
浏览 134
已采纳

io.Copy()导致文件稀疏

I want to copy files from one place to another and the problem is I deal with a lot of sparse files.

Is there any (easy) way of copying sparse files without becoming huge at the destination?

My basic code:

out, err := os.Create(bricks[0] + "/" + fileName)
in, err := os.Open(event.Name)
io.Copy(out, in)
  • 写回答

1条回答 默认 最新

  • dongying9756 2017-03-27 07:39
    关注

    Some background theory

    Note that io.Copy() pipes raw bytes – which is sort of understandable once you consider that it pipes data from an io.Reader to an io.Writer which provide Read([]byte) and Write([]byte), correspondingly. As such, io.Copy() is able to deal with absolutely any source providing bytes and absolutely any sink consuming them.

    On the other hand, the location of the holes in a file is a "side-channel" information which "classic" syscalls such as read(2) hide from their users. io.Copy() is not able to convey such side-channel information in any way.

    IOW, initially, file sparseness was an idea to just have efficient storage of the data behind the user's back.

    So, no, there's no way io.Copy() could deal with sparse files in itself.

    What to do about it

    You'd need to go one level deeper and implement all this using the syscall package and some manual tinkering.

    To work with holes, you should use the SEEK_HOLE and SEEK_DATA special values for the lseek(2) syscall which are, while formally non-standard, are supported by all major platforms.

    Unfortunately, the support for those "whence" positions is not present neither in the stock syscall package (as of Go 1.8.1) nor in the golang.org/x/sys tree.

    But fear not, there are two easy steps:

    1. First, the stock syscall.Seek() is actually mapped to lseek(2) on the relevant platforms.

    2. Next, you'd need to figure out the correct values for SEEK_HOLE and SEEK_DATA for the platforms you need to support.

      Note that they are free to be different between different platforms!

      Say, on my Linux system I can do simple

      $ grep -E 'SEEK_(HOLE|DATA)' </usr/include/unistd.h 
      #  define SEEK_DATA     3       /* Seek to next data.  */
      #  define SEEK_HOLE     4       /* Seek to next hole.  */
      

      …to figure out the values for these symbols.

    Now, say, you create a Linux-specific file in your package containing something like

    // +build linux
    
    const (
        SEEK_DATA = 3
        SEEK_HOLE = 4
    )
    

    and then use these values with the syscall.Seek().

    The file descriptor to pass to syscall.Seek() and friends can be obtained from an opened file using the Fd() method of os.File values.

    The pattern to use when reading is to detect regions containing data, and read the data from them – see this for one example.

    Note that this deals with reading sparse files; but if you'd want to actually transfer them as sparse – that is, with keeping this property of them, – the situation is more complicated: it appears to be even less portable, so some research and experimentation is due.

    On Linux, it appears you could try to use fallocate(2) with FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE to try to punch a hole at the end of the file you're writing to; if that legitimately fails (with syscall.EOPNOTSUPP), you just shovel as many zeroed blocks to the destination file as covered by the hole you're reading – in the hope the OS will do the right thing and will convert them to a hole by itself.

    Note that some filesystems do not support holes at all – as a concept. One example is the filesystems in the FAT family. What I'm leading you to is that inability of creating a sparse file might actually be a property of the target filesystem in your case.

    You might find Go issue #13548 "archive/tar: add support for writing tar containing sparse files" to be of interest.


    One more note: you might also consider checking whether the destination directory to copy a source file resides in the same filesystem as the source file, and if this holds true, use the syscall.Rename() (on POSIX systems) or os.Rename() to just move the file across different directories w/o actually copying its data.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 matlab计算中误差
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制
  • ¥20 usb设备兼容性问题
  • ¥15 错误(10048): “调用exui内部功能”库命令的参数“参数4”不能接受空数据。怎么解决啊