doucai4274 2019-06-27 21:02
浏览 246
已采纳

在Golang中查找模式的字节偏移

We can find the byte offset of a pattern from file by "grep -ob pattern filename"; However, grep is not utf8 safe. How do I find byte offset of a pattern in Go? The file is process log, which can be in TB.

This is what I want to get in Go:

$ cat fname
hello world
findme
hello 世界
findme again

...

$ grep -ob findme fname

12:findme
32:findme
  • 写回答

1条回答 默认 最新

  • dongmeiwei0226 2019-06-27 21:49
    关注

    FindAllStringIndex(s string, n int) returns byte start/finish indexes (i.e., slices) of all successive matches of the expression:

    package main
    
    import "fmt"
    import "io/ioutil"
    import "regexp"
    
    func main() {
        fname := "C:\\Users\\UserName\\go\\src\\so56798431\\fname"
        b, err := ioutil.ReadFile(fname)
        if err != nil {
          panic(err)
        }
    
        re, err := regexp.Compile("findme")
        if err != nil {
          // handle error
        }
        fmt.Println(re.FindAllStringIndex(string(b), -1))
    }
    

    Output:

    [[12 18] [32 38]]

    Note: I did this on Microsoft Windows, but saved the file in UNIX format (linefeed); if input file saved in Windows format (carriage return & linefeed) the byte offsets would increment to 13 and 35, respectively.

    UPDATE: for large files, use bufio.Scanner; for example:

    package main
    
    import (
        "bufio"
        "fmt"
        "log"
        "os"
        "regexp"
    )
    
    func main() {
        fname, err := os.Open("C:\\Users\\UserName\\go\\src\\so56798431\\fname")
        if err != nil {
            log.Fatal(err)
        }
        defer fname.Close()
    
        re, err := regexp.Compile("findme")
        if err != nil {
          // handle error
        }
    
        scanner := bufio.NewScanner(fname)
        bytesRead := 0
        for scanner.Scan() {
            b := scanner.Text()
            //fmt.Println(b)
            results := re.FindAllStringIndex(b, -1)
            for _, result := range results {
                fmt.Println(bytesRead + result[0])
            }
            // account for UNIX EOL marker
            bytesRead += len(b) + 1
        }
    
        if err := scanner.Err(); err != nil {
            log.Fatal(err)
        }
    }
    

    Output:

    12

    32

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 有偿求易语言word文档取doc和docx页数方法或模块
  • ¥15 找能接spark如图片的,可议价
  • ¥15 关于#单片机#的问题,请各位专家解答!
  • ¥15 博通raid 的写入速度很高也很低
  • ¥15 目标计数模型训练过程中的问题
  • ¥100 Acess连接SQL 数据库后 不能用中文筛选
  • ¥15 用友U9Cloud的webapi
  • ¥20 电脑拓展屏桌面被莫名遮挡
  • ¥20 ensp,用局域网解决
  • ¥15 Python语言实验