douyin2435 2017-09-02 06:42
浏览 51
已采纳

如何在Go中将结果输出到并发Web刮板的CSV中?

I'm new to Go and am trying to take advantage of the concurrency in Go to build a basic scraper to pull extract title, meta description, and meta keywords from URLs.

I am able to print out the results to terminal with the concurrency but can't figure out how to write output to CSV. I've tried many a variations that I could think of with limited knowledge of Go and many end up breaking the concurrency - so losing my mind a bit.

My code and URL input file is below - Thanks in advance for any tips!

// file name: metascraper.go
package main

import (
    // import standard libraries
    "encoding/csv"
    "fmt"
    "io"
    "log"
    "os"
    "time"
    // import third party libraries
    "github.com/PuerkitoBio/goquery"
)

func csvParsing() {
    file, err := os.Open("data/sample.csv")
    checkError("Cannot open file ", err)

    if err != nil {
        // err is printable
        // elements passed are separated by space automatically
        fmt.Println("Error:", err)
        return
    }

    // automatically call Close() at the end of current method
    defer file.Close()
    //
    reader := csv.NewReader(file)
    // options are available at:
    // http://golang.org/src/pkg/encoding/csv/reader.go?s=3213:3671#L94
    reader.Comma = ';'
    lineCount := 0

    fileWrite, err := os.Create("data/result.csv")
    checkError("Cannot create file", err)
    defer fileWrite.Close()

    writer := csv.NewWriter(fileWrite)
    defer writer.Flush()

    for {
        // read just one record
        record, err := reader.Read()
        // end-of-file is fitted into err
        if err == io.EOF {
            break
        } else if err != nil {
            fmt.Println("Error:", err)
            return
        }

        go func(url string) {
            // fmt.Println(msg)
            doc, err := goquery.NewDocument(url)
            if err != nil {
                checkError("No URL", err)
            }

            metaDescription := make(chan string, 1)
            pageTitle := make(chan string, 1)

            go func() {
                // time.Sleep(time.Second * 2)
                // use CSS selector found with the browser inspector
                // for each, use index and item
                pageTitle <- doc.Find("title").Contents().Text()

                doc.Find("meta").Each(func(index int, item *goquery.Selection) {
                    if item.AttrOr("name", "") == "description" {
                        metaDescription <- item.AttrOr("content", "")
                    }
                })
            }()
            select {
            case res := <-metaDescription:
                resTitle := <-pageTitle
                fmt.Println(res)
                fmt.Println(resTitle)

                // Have been trying to output to CSV here but it's not working

                // writer.Write([]string{url, resTitle, res})
                // err := writer.WriteString(`res`)
                // checkError("Cannot write to file", err)

            case <-time.After(time.Second * 2):
                fmt.Println("timeout 2")
            }

        }(record[0])

        fmt.Println()

        lineCount++
    }
}

func main() {

    csvParsing()

    //Code is to make sure there is a pause before program finishes so we can see output
    var input string
    fmt.Scanln(&input)
}

func checkError(message string, err error) {
    if err != nil {
        log.Fatal(message, err)
    }
}

The data/sample.csv input file with URLs:

    http://jonathanmh.com
    http://keshavmalani.com
    http://google.com
    http://bing.com
    http://facebook.com
  • 写回答

1条回答 默认 最新

  • dongqin1861 2017-09-02 07:17
    关注

    In the code you supplied, you had commented the following code:

    // Have been trying to output to CSV here but it's not working
    err = writer.Write([]string{url, resTitle, res})
    checkError("Cannot write to file", err)
    

    This code is correct, except you have one issue. Earlier in the function, you have the following code:

    fileWrite, err := os.Create("data/result.csv")
    checkError("Cannot create file", err)
    defer fileWrite.Close()
    

    This code causes the fileWriter to close once your csvParsing() func exits. Because you've closed fileWriter with the defer, you are unable to write to it in your concurrent function.

    Solution: You'll need to use defer fileWrite.Close() inside your concurrent func or something similar so you do not close the fileWriter before you have written to it.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 r语言蛋白组学相关问题
  • ¥15 Python时间序列如何拟合疏系数模型
  • ¥15 求学软件的前人们指明方向🥺
  • ¥50 如何增强飞上天的树莓派的热点信号强度,以使得笔记本可以在地面实现远程桌面连接
  • ¥20 双层网络上信息-疾病传播
  • ¥50 paddlepaddle pinn
  • ¥20 idea运行测试代码报错问题
  • ¥15 网络监控:网络故障告警通知
  • ¥15 django项目运行报编码错误
  • ¥15 STM32驱动继电器