donglu1881 2017-10-16 06:36
浏览 81
已采纳

从末尾读取日志文件并获取特定字符串的偏移量

.e.g. 1. logfile

  • Start
  • Line1
  • Line2
  • Line3
  • End

I am able to get the seek position of Line1 when I read the file from beginning.

func getSeekLocation() int64 {
    start := int64(0)
    input, err := os.Open(logFile)
    if err != nil {
        fmt.Println(err)
    }
    if _, err := input.Seek(start, io.SeekStart); err != nil {
        fmt.Println(err)
    }
    scanner := bufio.NewScanner(input)

    pos := start
    scanLines := func(data []byte, atEOF bool) (advance int, token []byte, 
    err error) {
        advance, token, err = bufio.ScanLines(data, atEOF)
        pos += int64(advance)
        return
    }
    scanner.Split(scanLines)
    for scanner.Scan() {
       if strings.Contains(scanner.Text(), "Line1") {
        break
       }
    }
    size, err := getFileSize()
    if err != nil {
        fmt.Println(err)
    }
    return size - pos
}

But this is not an efficient way to solve the problem because as the file size increases the time to get the location will also increase. I would like to get the location of the line from the EOF location which I think would be more efficient.

  • 写回答

1条回答 默认 最新

  • dqf42223 2017-10-16 09:33
    关注

    Note: I optimized and improved the below solution, and released it as a library here: github.com/icza/backscanner


    bufio.Scanner uses an io.Reader as its source, which does not support seeking and / or reading from arbitrary positions, so it is not capable of scanning lines from the end. bufio.Scanner can only read any part of the input once all data preceeding it has already been read (that is, it can only read the end of the file if it reads all the file's content first).

    So we need a custom solution to implement such functionality. Fortunately os.File does support reading from arbitrary positions as it implements both io.Seeker and io.ReaderAt (any of them would be sufficient to do what we need).

    Scanner that returns lines going backward, starting at the end

    Let's construct a Scanner which scans lines backward, starting with the last line. For this, we'll utilize an io.ReaderAt. The following implementation uses an internal buffer into which data is read by chunks, starting from the end of the input. The size of the input must also be passed (which is basically the position where we want to start reading from, which must not necessarily be the end position).

    type Scanner struct {
        r   io.ReaderAt
        pos int
        err error
        buf []byte
    }
    
    func NewScanner(r io.ReaderAt, pos int) *Scanner {
        return &Scanner{r: r, pos: pos}
    }
    
    func (s *Scanner) readMore() {
        if s.pos == 0 {
            s.err = io.EOF
            return
        }
        size := 1024
        if size > s.pos {
            size = s.pos
        }
        s.pos -= size
        buf2 := make([]byte, size, size+len(s.buf))
    
        // ReadAt attempts to read full buff!
        _, s.err = s.r.ReadAt(buf2, int64(s.pos))
        if s.err == nil {
            s.buf = append(buf2, s.buf...)
        }
    }
    
    func (s *Scanner) Line() (line string, start int, err error) {
        if s.err != nil {
            return "", 0, s.err
        }
        for {
            lineStart := bytes.LastIndexByte(s.buf, '
    ')
            if lineStart >= 0 {
                // We have a complete line:
                var line string
                line, s.buf = string(dropCR(s.buf[lineStart+1:])), s.buf[:lineStart]
                return line, s.pos + lineStart + 1, nil
            }
            // Need more data:
            s.readMore()
            if s.err != nil {
                if s.err == io.EOF {
                    if len(s.buf) > 0 {
                        return string(dropCR(s.buf)), 0, nil
                    }
                }
                return "", 0, s.err
            }
        }
    }
    
    // dropCR drops a terminal  from the data.
    func dropCR(data []byte) []byte {
        if len(data) > 0 && data[len(data)-1] == '' {
            return data[0 : len(data)-1]
        }
        return data
    }
    

    Example using it:

    func main() {
        scanner := NewScanner(strings.NewReader(src), len(src))
        for {
            line, pos, err := scanner.Line()
            if err != nil {
                fmt.Println("Error:", err)
                break
            }
            fmt.Printf("Line start: %2d, line: %s
    ", pos, line)
        }
    }
    
    const src = `Start
    Line1
    Line2
    Line3
    End`
    

    Output (try it on the Go Playground):

    Line start: 24, line: End
    Line start: 18, line: Line3
    Line start: 12, line: Line2
    Line start:  6, line: Line1
    Line start:  0, line: Start
    Error: EOF
    

    Notes:

    • The above Scanner does not limit max length of lines, it handles all.
    • The above Scanner handles both and line endings (ensured by the dropCR() function).
    • You may pass any starter position not just the size / length, and listing lines will be performed from there (continuation).
    • The above Scanner does not reuse buffers, always creates new ones when needed. It would be enough to (pre)allocate 2 buffers, and use those wisely. Implementation would become more complex, and it would introduce a max line length limit.

    Using it with a file

    To use this Scanner with a file, you may use os.Open() to open a file. Note that *File implements io.ReaderAt(). Then you may use File.Stat() to obtain info about the file (os.FileInfo), including its size (length):

    f, err := os.Open("a.txt")
    if err != nil {
        panic(err)
    }
    fi, err := f.Stat()
    if err != nil {
        panic(err)
    }
    defer f.Close()
    
    scanner := NewScanner(f, int(fi.Size()))
    

    Looking for a substring in a line

    If you're looking for a substring in a line, then simply use the above Scanner which returns the starting pos of each line, reading lines from the end.

    You may check the substring in each line using strings.Index(), which returns the substring position inside the line, and if found, add the line start position to this.

    Let's say we're looking for the "ine2" substring (which is part of the "Line2" line). Here's how you can do that:

    scanner := NewScanner(strings.NewReader(src), len(src))
    what := "ine2"
    for {
        line, pos, err := scanner.Line()
        if err != nil {
            fmt.Println("Error:", err)
            break
        }
        fmt.Printf("Line start: %2d, line: %s
    ", pos, line)
    
        if i := strings.Index(line, what); i >= 0 {
            fmt.Printf("Found %q at line position: %d, global position: %d
    ",
                what, i, pos+i)
            break
        }
    }
    

    Output (try it on the Go Playground):

    Line start: 24, line: End
    Line start: 18, line: Line3
    Line start: 12, line: Line2
    Found "ine2" at line position: 1, global position: 13
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿
  • ¥15 回答4f系统的像差计算
  • ¥15 java如何提取出pdf里的文字?
  • ¥100 求三轴之间相互配合画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败