doucai4274 2019-06-27 21:02
浏览 246
已采纳

在Golang中查找模式的字节偏移

We can find the byte offset of a pattern from file by "grep -ob pattern filename"; However, grep is not utf8 safe. How do I find byte offset of a pattern in Go? The file is process log, which can be in TB.

This is what I want to get in Go:

$ cat fname
hello world
findme
hello 世界
findme again

...

$ grep -ob findme fname

12:findme
32:findme
  • 写回答

1条回答 默认 最新

  • dongmeiwei0226 2019-06-27 21:49
    关注

    FindAllStringIndex(s string, n int) returns byte start/finish indexes (i.e., slices) of all successive matches of the expression:

    package main
    
    import "fmt"
    import "io/ioutil"
    import "regexp"
    
    func main() {
        fname := "C:\\Users\\UserName\\go\\src\\so56798431\\fname"
        b, err := ioutil.ReadFile(fname)
        if err != nil {
          panic(err)
        }
    
        re, err := regexp.Compile("findme")
        if err != nil {
          // handle error
        }
        fmt.Println(re.FindAllStringIndex(string(b), -1))
    }
    

    Output:

    [[12 18] [32 38]]

    Note: I did this on Microsoft Windows, but saved the file in UNIX format (linefeed); if input file saved in Windows format (carriage return & linefeed) the byte offsets would increment to 13 and 35, respectively.

    UPDATE: for large files, use bufio.Scanner; for example:

    package main
    
    import (
        "bufio"
        "fmt"
        "log"
        "os"
        "regexp"
    )
    
    func main() {
        fname, err := os.Open("C:\\Users\\UserName\\go\\src\\so56798431\\fname")
        if err != nil {
            log.Fatal(err)
        }
        defer fname.Close()
    
        re, err := regexp.Compile("findme")
        if err != nil {
          // handle error
        }
    
        scanner := bufio.NewScanner(fname)
        bytesRead := 0
        for scanner.Scan() {
            b := scanner.Text()
            //fmt.Println(b)
            results := re.FindAllStringIndex(b, -1)
            for _, result := range results {
                fmt.Println(bytesRead + result[0])
            }
            // account for UNIX EOL marker
            bytesRead += len(b) + 1
        }
    
        if err := scanner.Err(); err != nil {
            log.Fatal(err)
        }
    }
    

    Output:

    12

    32

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么