I am working on a small script which uses bufio.Scanner and http.Request as well as go routines to count words and lines in parallel.
package main
import (
"bufio"
"fmt"
"io"
"log"
"net/http"
"time"
)
func main() {
err := request("http://www.google.com")
if err != nil {
log.Fatal(err)
}
// just keep main alive with sleep for now
time.Sleep(2 * time.Second)
}
func request(url string) error {
res, err := http.Get(url)
if err != nil {
return err
}
go scanLineWise(res.Body)
go scanWordWise(res.Body)
return err
}
func scanLineWise(r io.Reader) {
s := bufio.NewScanner(r)
s.Split(bufio.ScanLines)
i := 0
for s.Scan() {
i++
}
fmt.Printf("Counted %d lines.
", i)
}
func scanWordWise(r io.Reader) {
s := bufio.NewScanner(r)
s.Split(bufio.ScanLines)
i := 0
for s.Scan() {
i++
}
fmt.Printf("Counted %d words.
", i)
}
As more or less expected from streams scanLineWise will count a number while scalWordWise will count zero. This is because scanLineWise already reads everything from req.Body.
I would know like to know: How to solve this elegantly?
My first thought was to build a struct which implements io.Reader and io.Writer. We could use io.Copy to read from req.Body and write it to the writer. When the scanners read from this writer then writer will copy the data instead of reading it. Unfortunately this will just collect memory over time and break the whole idea of streams...