duanmei1694 2016-07-12 15:44
浏览 194
已采纳

tar存档保留硬链接

Using the archive/tar package in Go, it doesn't seem possible to access the number of hardlinks a file has. However, I remember reading somewhere that tar'ing a directory or file can preserve the hardlinks.

Is there some package in Go that can help me do this?

  • 写回答

2条回答 默认 最新

  • duanhemou9834 2016-07-12 17:07
    关注

    tar does preserve the hardlinks.

    Here's a sample directory with three hard-linked files and one file with a single link:

    foo% vdir .
    total 16
    -rw-r--r-- 3 kostix kostix 5 Jul 12 19:37 bar.txt
    -rw-r--r-- 3 kostix kostix 5 Jul 12 19:37 foo.txt
    -rw-r--r-- 3 kostix kostix 5 Jul 12 19:37 test.txt
    -rw-r--r-- 1 kostix kostix 9 Jul 12 19:49 xyzzy.txt
    

    Now we archive it using GNU tar and verify it indeed added the links (because we didn't pass it the --hard-dereferece command-line option):

    foo% tar -cf ../foo.tar .
    foo% tar -tvf ../foo.tar
    drwxr-xr-x kostix/kostix     0 2016-07-12 19:49 ./
    -rw-r--r-- kostix/kostix     9 2016-07-12 19:49 ./xyzzy.txt
    -rw-r--r-- kostix/kostix     5 2016-07-12 19:37 ./bar.txt
    hrw-r--r-- kostix/kostix     0 2016-07-12 19:37 ./test.txt link to ./bar.txt
    hrw-r--r-- kostix/kostix     0 2016-07-12 19:37 ./foo.txt link to ./bar.txt
    

    The documentation of archive/tar refers to a bunch of documents defining the standard on the tar archive (and unfortunately, there's no a single standard: for instance, GNU tar does not support POSIX extended attributes, while BSD tar (which relies on libarchive) does, and so does pax). To cite its bit on the hardlinks:

    LNKTYPE

    This flag represents a file linked to another file, of any type, previously archived. Such files are identified in Unix by each file having the same device and inode number. The linked-to name is specified in the linkname field with a trailing null.

    So, a hadrlink is an enrty of a special type ('1') which refers to some preceding (already archived) file by its name.

    So let's create a playground example.

    We base64-encode our archive:

    foo% base64 <../foo.tar | xclip -selection clipboard
    

    …and write the code. The archive contains a single directory, one file (type '0') another file (type '0') followed by two hardlinks (type '1') to it.

    The output from the playground example:

    Archive entry '5': ./
    Archive entry '0': ./xyzzy.txt
    Archive entry '0': ./bar.txt
    Archive entry '1': ./test.txt link to ./bar.txt
    Archive entry '1': ./foo.txt link to ./bar.txt
    

    So your link-counting code should:

    1. Scan the entire archive record-by-record.

    2. Remember any regular file (type archive/tar.TypeReg or type archive/tar.TypeRegA) already processed, and have a counter associated with it, which starts at 1.

      Well, in reality, you'd better be exclusive and record entries of all types except symbolic links and directories — because tar archives can contain nodes for character and block devices, and FIFOs (named pipes).

    3. When you encounter a hard link (type archive/tar.TypeReg),

      1. Read the Linkname field of its header.
      2. Look your list of "seen" files up and increase the counter of its entry which matches that name.

    Update

    As the OP actually wanted to know how to manage hardlinks on the source filesystem, here's the update.

    The chief idea is that on a filesystem with POSIX semantics:

    • A directory entry designating a file actually points to a special filesystem metadata block called "inode". The inode contains the number of directory entries pointing to it.

      Creating a hardlink is actually just:

      1. Creating a new directory entry pointing to the inode of the original (source) file — "the link target" in the lns terms.
      2. Incrementing the link counter in that inode.
    • Hence any file is uniquely identified by two integer numbers: the "device number" identifying the physical device hosting the filesystem on which the file is located, and inode number identifying the file's data.

      It follows, that if two files have the same (device, inode) pairs, they represent the same content. Or, if we put it differently, one is a hardlink to the other.

    So, adding files to a tar archive while preserving the hardlinks works this way:

    1. Having added a file, save its (device, inode) pair to some lookup table.

    2. When adding another file, figure out its (device, inode) pair and look it up in that table.

      If a matching entry is found, the file's data was already streamed, and we should add a hardlink.

      Otherwise, behave as in step (1).

    So here's the code:

    package main
    
    import (
        "archive/tar"
        "io"
        "log"
        "os"
        "path/filepath"
        "syscall"
    )
    
    type devino struct {
        Dev uint64
        Ino uint64
    }
    
    func main() {
        log.SetFlags(0)
    
        if len(os.Args) != 2 {
            log.Fatalf("Usage: %s DIR
    ", os.Args[0])
        }
    
        seen := make(map[devino]string)
    
        tw := tar.NewWriter(os.Stdout)
    
        err := filepath.Walk(os.Args[1],
            func(fn string, fi os.FileInfo, we error) (err error) {
                if we != nil {
                    log.Fatal("Error processing directory", we)
                }
    
                hdr, err := tar.FileInfoHeader(fi, "")
                if err != nil {
                    return
                }
    
                if fi.IsDir() {
                    err = tw.WriteHeader(hdr)
                    return
                }
    
                st := fi.Sys().(*syscall.Stat_t)
                di := devino{
                    Dev: st.Dev,
                    Ino: st.Ino,
                }
    
                orig, ok := seen[di]
                if ok {
                    hdr.Typeflag = tar.TypeLink
                    hdr.Linkname = orig
                    hdr.Size = 0
    
                    err = tw.WriteHeader(hdr)
                    return
                }
    
                fd, err := os.Open(fn)
                if err != nil {
                    return
                }
                err = tw.WriteHeader(hdr)
                if err != nil {
                    return
                }
                _, err = io.Copy(tw, fd)
                fd.Close() // Ignoring error for a file opened R/O
                if err == nil {
                    seen[di] = fi.Name()
                }
                return err
            })
    
        if err != nil {
            log.Fatal(err)
        }
    
        err = tw.Close()
        if err != nil {
            log.Fatal(err)
        }
    
        return
    }
    

    Note that it's quite inadequate:

    • It improperly deals with file and directory names.

    • It does not attempt to properly work with symlinks and FIFOs, and skip Unix-domain sockets etc.

    • It assumes it works in a POSIX environment.

      On non-POSIX systems, the Sys() method called on a value of type os.FileInfo might return something else rather than the POSIX'y syscall.Stat_t.

      Say, on Windows, there are multiple filesystems hosted by different "disks" or "drives". I have no idea how Go handles that. Maybe the "device number" had to be emulated somehow for this case.

    On the other hand, it shows how to handle hardlinks:

    • Set the "Linkname" field of the header struct.
    • Reset the "Size" field of the header to 0 (because no data will follow).

    You might also want to use another approach to maintain the lookup table: if most of your files are expected to be located on the same physical filesystem, each entry wastes an uint64 for the device number of each entry. So a hierarchy of maps might be a sensible thing to do: the first maps device numbers to another map which maps inode numbers to file names.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 抖音咸鱼付款链接转码支付宝
  • ¥15 ubuntu22.04上安装ursim-3.15.8.106339遇到的问题
  • ¥15 求螺旋焊缝的图像处理
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥15 网络通信安全解决方案
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面
  • ¥15 itunes恢复数据最后一步发生错误
  • ¥15 关于#windows#的问题:2024年5月15日的win11更新后资源管理器没有地址栏了顶部的地址栏和文件搜索都消失了