buffo.Scanner逐行读取文件的奇怪行为

i use bufio.Scanner for reading a file line-by-line into the variable wordlist ([][]byte)

This is the code (tested with go 1.1 / 1.3).

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
)

func main() {
    fle, err := os.Open("words.txt")
    if err != nil {
        log.Fatal(err)
    }
    defer fle.Close()

    scanner := bufio.NewScanner(fle)

    n := 1000
    dCnt := 5
    var wordlist [][]byte

    for scanner.Scan() {
        if len(wordlist) == n {
            break
        }
        word := scanner.Bytes()
        for ii := 0; ii < len(wordlist); ii++ {
            if string(word) == string(wordlist[ii]) {
                log.Println(ii, string(word), string(wordlist[ii]))
                log.Println(len(wordlist), "double")

                dCnt--
                if dCnt == 0 {
                    for i, v := range wordlist {
                        fmt.Println(i, string(v))
                    }
                    log.Fatal("double")
                }
            }
        }
        wordlist = append(wordlist, word)
    }
    if err := scanner.Err(); err != nil {
        log.Fatal(err)
    }
}

words.txt is a file of 5040 lines of permutations of the sequenz "abcdefg":

line 1 .. 
abcdefg
abcdegf
abcdfeg
abcdfge
..
line 510 ..
afcdbge
afcdebg
afcdegb
afcdgbe
afcdgeb
.. line 5040

generated by this small python script:

from itertools import permutations as perm
c = "abcdefg"
p = perm(c, len(c))
with file('words.txt','wb') as outFle:
    for i in xrange(5040):
        n = ''.join(p.next())
        print >> outFle, n

The problem is, that after running the above go program the wordlist contains the following:

index string(wordlist[])

0 afcdebg      <-- this is line 513 of words.txt
1 afcdegb
2 afcdgbe
3 afcdgeb
...
510 bdefcag
511 bdefcga
512 afcdebg    <-- this is the begin of a repition of line 513 .. 1024 in words.ttx
513 afcdegb
514 afcdgbe

Instead wordlist should contain the first 1000 lines of words.txt

Any Ideas ?

The answer was given by Daniel Darabos (see below)

changing

word := scanner.Bytes()

word := scanner.Text() ' did the job.

(Thanks for your help!)

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douke7274 2014-07-23 20:52
关注
The documentation of Scanner.Bytes says:

The underlying array may point to data that will be overwritten by a subsequent call to Scan.

So if you save the returned slice, you can expect to see its contents change. This wreaks havoc in your application. Better to not save the returned slice!

A nice solution is to build a string from the bytes:

word := string(scanner.Bytes())

Then you can work with strings everywhere and the code becomes more pleasant.

What is going on?

Why does Scanner.Bytes hate me? The answer is also in the documentation:

It does no allocation.

This makes the Scanner nicely efficient. From what you see, I guess it allocates buffers for 512 lines in the constructor and then rotates over them.

This is not a problem in applications where you do not need to keep references to the lines. (For example a grep-like program only looks at each line once.) Often you parse the line and store a reference to that. But if you want to store the raw byte data, you are responsible for copying it out from the Scanner.

This may be a hassle, but while you can implement the convenient behavior on top of the inconvenient one, it would be impossible to implement the efficient behavior on top of the inefficient one.

Also a simpler script for generating the input:

import itertools for p in itertools.permutations('abcdefg'): print ''.join(p)
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

buffo.Scanner逐行读取文件的奇怪行为
2014-07-23 20:06

回答 1 已采纳 The documentation of Scanner.Bytes says: The underlying array may point to data that will be o
使用buffo.read读取文件时的Golang动态大小调整片
2017-06-07 16:38

回答 1 已采纳 If you want to read line by line, and you're using a buffered reader, use the buffered reader's Re
Buffonneedleexperimentforvisualizationmathematicap_ buffon_buffo
2022-07-14 20:53

布冯投针实验的mathematica程序可视化，对使用计算机啊模拟以及应用蒙特卡洛法的例子
java处读取个文件夹_java入门了解10
2021-03-13 21:59

是你的皮卡丘的博客可以表示文件或者文件夹(也可称为目录)b.创建的File对象实际上不存在的文件只是代表了一个抽象路径c.Windows中分隔符'\'('//'也可以);Unix/Linux'\';更专业的写法：File.pathSeparatorChar(二)绝对路径相对路径a....
[Golang]读写文件操作
2019-04-30 11:13

sunbirdwhz的博客一般情况下，ioutil库读取文件的效率最高，bufio库次之，直接使用文件的Read加上buffer的方法效率最低。但对于大文件，设置相同大小的buffer有时bufio反而会更慢。 1.ioutil读取文件 func LoadFile(filename ...
11-标准库fmt以及文件操作
2022-07-01 08:47

爱写代码的小男孩的博客系列函数会将内容输出到接口类型的变量w中，通常用这个函数往文件中写入内容。 Sprint 系列函数会把传入的数据生成并返回一个字符串。格式化占位符系列函数都支持format格式化参数，在这里我们按照占位符将被替换...
Go 语言如何操作文件
2022-08-01 10:29

Ch3nnn的博客本文归根结底是介绍os、io、bufio这些包如何操作文件，因为Go语言操作提供了太多了方法，借着本文全都介绍出来，在使用的时候可以很方便的当作文档查询，如果你问用什么方法操作文件是最优的方法，这个我也没法回答...
超全总结：Go 语言如何操作文件
2022-07-25 08:51

煎鱼（EDDYCJY）的博客前言哈喽，大家好，我是asong。我们都知道在Unix中万物都被称为文件，文件处理是一个非常常见的问题，所以本文就总结了Go语言操作文件的常见方式，整体思路如下：Go语言版本：1.18本文所有代码已经上传github：...
超全总结：Go语言如何操作文件
2022-07-22 14:44

hebiwen95的博客我们都知道在Unix中万物都被称为文件，文件处理是一个非常常见的问题，所以本文就总结了GoGo语言版本1.18本文所有代码已经上传githubhttps本文归根结底是介绍os、io、bufio这些包如何操作文件，因为Gohttps。...
linux tcp ip 协议栈分析 pdf,linux tcpip协议栈分析.doc
2021-05-14 12:57

杜佳加的博客 skb_share_check用于检査引用计数skb-> users,如果users变量表明skb是被共亨的，则克隆一个新的sk_buffo如果一个缓冲区被克隆了，这个缓冲区的内容就不能被修改。这就意味着，访问数据的函数没有必要加锁。因此，...
分割文件与合并文件
2014-03-02 23:26

nice_coding的博客 buffo.write(buf, 0, len); buffo.flush(); if (file.length() > 1024 * 1024 * 200) { buffo.close(); count++; } } buffi.close(); } /** * 合并文件 * * @...
Golang 有料才能够浪
2021-01-30 21:53

码刀攻城的博客 go 常用的包用法简介说明
Java IO流
2020-05-29 09:02

QK芒果小洛的博客节点流 FileInputStream 文件字节输入流通常使用read(byte[] b)方法，自定义一个数组作为缓冲数组，一次性... * 标准步骤：【分段读取存放在byte数组里面】【文件字节输入流】FileInputStream * * 先读取到字节数组
java入门了解10
2019-09-24 09:08

dianjiang0725的博客 1.IO: 1.File (一)注意： a.可以表示文件或者文件夹（也可称为目录） b.创建的File对象实际上不存在的文件只是代表了一个抽象路径 c.Windows中分隔符'\'('//'也可以);Unix/Linux'\';更专业的写法：File.pathSep...
计算机常识知识--磁盘、内存、IO
2022-05-03 14:37

是长乐未央呀的博客当数据存在磁盘中时，读取的速度相当于内存来说是很慢的。内存内存的两个指标：寻址（ns）和带宽(G/M） IO buffer 磁盘有磁道和扇区，一扇区有512个byte，当存储空间较小时会给操作系统带来一个成本问题...
Wordle_1.0
2022-05-29 20:34

yyf525的博客目录一：前言二：玩法 1.游戏 2.查看记录 3.排词器 ... 1.... 2.... 3.... 看到最近网上出了一个比较火的Wordle小游戏，自己也写了一个，挽起来效果也不错，希望大家喜欢。...运行之后会出现四个选...
JavaSE基础学习笔记 All In One
2020-05-29 08:48

QK芒果小洛的博客 IO 节点流 FileInputStream 文件字节输入流通常使用read(byte[] b)方法，自定义一个数组作为缓冲数组，一次性... * 标准步骤：【分段读取存放在byte数组里面】【文件字节输入流】FileInputStream * * 先读取到字
Tenth season third episode,Ross is going to be so tan(black)??????
2021-05-14 22:28

LittleChickenCoder的博客 Your last name is Buffo-Martisis! Amanda: Let's see… to assure you get this directly, ring me back on my mobile. Phoebe: Ok, don't hold thy breath! (Sound of dialing numbers is heard from the ...
计算机弹音乐百度百科,音乐术语
2021-07-02 20:42

weixin_39966225的博客音乐术语是指在音乐表演中用来指导演奏者表演的专业术语。...新民族主义未来主义噪音音乐具体音乐微分音乐先锋派序列主义偶然音乐磁带音乐电子音乐简约派后现代主义概念音乐行为音乐词条图册更多图册
php dom转字符串,将domeElement转换为字符串,然后删除不需要的标记
2021-04-23 08:38

曹舟力的博客所以,我有XML文件:LA CENERENTOLA ("Cinderella")Opera buffo by Gioachino RossiniMusic Director and Conductor: Arvo VolmerStage Director and Set Designer: Michiel Dijkema (Amsterdam)Costumes: Claudia ...
没有解决我的问题, 去提问

悬赏问题

¥100 set_link_state
¥15 虚幻5 UE美术毛发渲染
¥15 CVRP 图论物流运输优化
¥15 Tableau online 嵌入ppt失败
¥100 支付宝网页转账系统不识别账号
¥15 基于单片机的靶位控制系统
¥15 真我手机蓝牙传输进度消息被关闭了，怎么打开？(关键词-消息通知)
¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度

buffo.Scanner逐行读取文件的奇怪行为

1条回答 默认 最新

What is going on?

悬赏问题

1条回答默认最新