donglu1881
donglu1881
2017-10-16 06:36

从末尾读取日志文件并获取特定字符串的偏移量

已采纳

.e.g. 1. logfile

  • Start
  • Line1
  • Line2
  • Line3
  • End

I am able to get the seek position of Line1 when I read the file from beginning.

func getSeekLocation() int64 {
    start := int64(0)
    input, err := os.Open(logFile)
    if err != nil {
        fmt.Println(err)
    }
    if _, err := input.Seek(start, io.SeekStart); err != nil {
        fmt.Println(err)
    }
    scanner := bufio.NewScanner(input)

    pos := start
    scanLines := func(data []byte, atEOF bool) (advance int, token []byte, 
    err error) {
        advance, token, err = bufio.ScanLines(data, atEOF)
        pos += int64(advance)
        return
    }
    scanner.Split(scanLines)
    for scanner.Scan() {
       if strings.Contains(scanner.Text(), "Line1") {
        break
       }
    }
    size, err := getFileSize()
    if err != nil {
        fmt.Println(err)
    }
    return size - pos
}

But this is not an efficient way to solve the problem because as the file size increases the time to get the location will also increase. I would like to get the location of the line from the EOF location which I think would be more efficient.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

1条回答

  • dqf42223 dqf42223 4年前

    Note: I optimized and improved the below solution, and released it as a library here: github.com/icza/backscanner


    bufio.Scanner uses an io.Reader as its source, which does not support seeking and / or reading from arbitrary positions, so it is not capable of scanning lines from the end. bufio.Scanner can only read any part of the input once all data preceeding it has already been read (that is, it can only read the end of the file if it reads all the file's content first).

    So we need a custom solution to implement such functionality. Fortunately os.File does support reading from arbitrary positions as it implements both io.Seeker and io.ReaderAt (any of them would be sufficient to do what we need).

    Scanner that returns lines going backward, starting at the end

    Let's construct a Scanner which scans lines backward, starting with the last line. For this, we'll utilize an io.ReaderAt. The following implementation uses an internal buffer into which data is read by chunks, starting from the end of the input. The size of the input must also be passed (which is basically the position where we want to start reading from, which must not necessarily be the end position).

    type Scanner struct {
        r   io.ReaderAt
        pos int
        err error
        buf []byte
    }
    
    func NewScanner(r io.ReaderAt, pos int) *Scanner {
        return &Scanner{r: r, pos: pos}
    }
    
    func (s *Scanner) readMore() {
        if s.pos == 0 {
            s.err = io.EOF
            return
        }
        size := 1024
        if size > s.pos {
            size = s.pos
        }
        s.pos -= size
        buf2 := make([]byte, size, size+len(s.buf))
    
        // ReadAt attempts to read full buff!
        _, s.err = s.r.ReadAt(buf2, int64(s.pos))
        if s.err == nil {
            s.buf = append(buf2, s.buf...)
        }
    }
    
    func (s *Scanner) Line() (line string, start int, err error) {
        if s.err != nil {
            return "", 0, s.err
        }
        for {
            lineStart := bytes.LastIndexByte(s.buf, '
    ')
            if lineStart >= 0 {
                // We have a complete line:
                var line string
                line, s.buf = string(dropCR(s.buf[lineStart+1:])), s.buf[:lineStart]
                return line, s.pos + lineStart + 1, nil
            }
            // Need more data:
            s.readMore()
            if s.err != nil {
                if s.err == io.EOF {
                    if len(s.buf) > 0 {
                        return string(dropCR(s.buf)), 0, nil
                    }
                }
                return "", 0, s.err
            }
        }
    }
    
    // dropCR drops a terminal  from the data.
    func dropCR(data []byte) []byte {
        if len(data) > 0 && data[len(data)-1] == '' {
            return data[0 : len(data)-1]
        }
        return data
    }
    

    Example using it:

    func main() {
        scanner := NewScanner(strings.NewReader(src), len(src))
        for {
            line, pos, err := scanner.Line()
            if err != nil {
                fmt.Println("Error:", err)
                break
            }
            fmt.Printf("Line start: %2d, line: %s
    ", pos, line)
        }
    }
    
    const src = `Start
    Line1
    Line2
    Line3
    End`
    

    Output (try it on the Go Playground):

    Line start: 24, line: End
    Line start: 18, line: Line3
    Line start: 12, line: Line2
    Line start:  6, line: Line1
    Line start:  0, line: Start
    Error: EOF
    

    Notes:

    • The above Scanner does not limit max length of lines, it handles all.
    • The above Scanner handles both and line endings (ensured by the dropCR() function).
    • You may pass any starter position not just the size / length, and listing lines will be performed from there (continuation).
    • The above Scanner does not reuse buffers, always creates new ones when needed. It would be enough to (pre)allocate 2 buffers, and use those wisely. Implementation would become more complex, and it would introduce a max line length limit.

    Using it with a file

    To use this Scanner with a file, you may use os.Open() to open a file. Note that *File implements io.ReaderAt(). Then you may use File.Stat() to obtain info about the file (os.FileInfo), including its size (length):

    f, err := os.Open("a.txt")
    if err != nil {
        panic(err)
    }
    fi, err := f.Stat()
    if err != nil {
        panic(err)
    }
    defer f.Close()
    
    scanner := NewScanner(f, int(fi.Size()))
    

    Looking for a substring in a line

    If you're looking for a substring in a line, then simply use the above Scanner which returns the starting pos of each line, reading lines from the end.

    You may check the substring in each line using strings.Index(), which returns the substring position inside the line, and if found, add the line start position to this.

    Let's say we're looking for the "ine2" substring (which is part of the "Line2" line). Here's how you can do that:

    scanner := NewScanner(strings.NewReader(src), len(src))
    what := "ine2"
    for {
        line, pos, err := scanner.Line()
        if err != nil {
            fmt.Println("Error:", err)
            break
        }
        fmt.Printf("Line start: %2d, line: %s
    ", pos, line)
    
        if i := strings.Index(line, what); i >= 0 {
            fmt.Printf("Found %q at line position: %d, global position: %d
    ",
                what, i, pos+i)
            break
        }
    }
    

    Output (try it on the Go Playground):

    Line start: 24, line: End
    Line start: 18, line: Line3
    Line start: 12, line: Line2
    Found "ine2" at line position: 1, global position: 13
    
    点赞 评论 复制链接分享

相关推荐