dongmu2517 2013-11-12 20:28
浏览 2
已采纳

扫描仪提前终止

I am trying to write a scanner in Go that scans continuation lines and also clean the line up before returning it so that you can return logical lines. So, given the following SplitLine function (Play):

func ScanLogicalLines(data []byte, atEOF bool) (int, []byte, error) {
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }

    i := bytes.IndexByte(data, '
')
    for i > 0 && data[i-1] == '\\' {
        fmt.Printf("i: %d, data[i] = %q
", i, data[i])
        i = i + bytes.IndexByte(data[i+1:], '
')
    }

    var match []byte = nil
    advance := 0
    switch {
    case i >= 0:
        advance, match = i + 1, data[0:i]
    case atEOF: 
        advance, match = len(data), data
    }
    token := bytes.Replace(match, []byte("\\
"), []byte(""), -1)
    return advance, token, nil
}

func main() {
    simple := `
Just a test.

See what is returned. \
when you have empty lines.

Followed by a newline.
`

    scanner := bufio.NewScanner(strings.NewReader(simple))
    scanner.Split(ScanLogicalLines)
    for scanner.Scan() {
        fmt.Printf("line: %q
", scanner.Text())
    }
}

I expected the code to return something like:

line: "Just a test."
line: ""
line: "See what is returned, when you have empty lines."
line: ""
line: "Followed by a newline."

However, it stops after returning the first line. The second call return 1, "", nil.

Anybody have any ideas, or is it a bug?

  • 写回答

1条回答 默认 最新

  • dongzhunqiu4841 2013-11-12 22:16
    关注

    I would regard this as a bug because an advance value > 0 is not intended to make a further read call, even when the returned token is nil (bufio.SplitFunc):

    If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, SplitFunc can return (0, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.

    What happens is this

    The input buffer of the bufio.Scanner defaults to 4096 byte. That means that it reads up to this amount at once if it can and then executes the split function. In your case the scanner can read your input all at once as it is well below 4096 byte. This means that the next read it will do results in EOF which is the main problem here.

    Step by step

    1. scanner.Scan reads all your data
    2. You get all the text that is there
    3. You look for a token, you find the first newline which is only one newline
    4. You return nil as a token by removing the newline from the match
    5. scanner.Scan assumes: user needs more data
    6. scanner.Scan attempts to read more
    7. EOF happens
    8. scanner.Scan tries to tokenize one last time
    9. You find "Just a test."
    10. scanner.Scan tries to tokenize one last time
    11. You look for a token, you find the third line which is only one newline
    12. You return nil as a token by removing the newline from the match
    13. scanner.Scan sees nil token and set error (EOF)
    14. Execution ends

    How to circumvent

    Any token that is non-nil will prevent this. As long as you return non-nil tokens the scanner will not check for EOF and continues executing your tokenizer.

    The reason why your code returns nil tokens is that bytes.Replace returns nil when there's nothing to be done. append([]byte(nil), nil...) == nil. You could prevent this by returning a slice with a capacity and no elements as this would be non-nil: make([]byte, 0, 1) != nil.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制