doujiao0110 2017-06-30 14:30
浏览 46

如何与Go并发理解和练习?

I am learning Go, and one of most powerful features is concurrency. I wrote PHP scripts before, they executed line-by-line, that's why it is difficult for me to understand channels and goroutines.

Is there are any website or any other resources (books, articles, etc.) where I can see a task that can be processed concurrently, so I can practice in concurrency with Go? It would be great, if at the end I can see the solution with comments and explanations why we do it this way and why this solution is better then others.

Just for example, here is the task that confuses me and I don't know how to approach: i need to make kinda parser, that receive start point (e.g.: http://example.com), and start navigating whole website (example.com/about, example.com/best-hotels/, etc.), and took some text parts from the each page (e.g., by selector, like h1.title and p.description) and then, after all website crawled, I receive a slice of parsed content. I know how to make requests, how to get information using selector, but I don't know how to organize communication between all the goroutines.

Thank you for any information and links. Hope this would help others with the same problem in future.

  • 写回答

1条回答 默认 最新

  • dongnaigu2052 2017-06-30 18:19
    关注

    so there are lots of resources online about concurrency patterns in go -- those three I got from a quick google search. But if you have something specific in mind, I think I can address that too.

    Looks like you want to crawl a website and get information from it's many pages concurrently, depositing that "information" into a common location (ie. a slice). The way to go here is to use a chan, chaonlinennel, which is a thread-safe (multiple threads can access it without fear) data-structure for channeling data from one thread/goroutine to another.

    And of course the go keyword in Go is how to spawn a goroutine.

    so for example, in a func main() thread:

    // get a listOfWebpages
    dataChannel := make(chan string)
    for _, webpage := range listOfWebpages {
        go fetchDataFromWebpage(webpage, dataChannel)
    }
    
    // the dataChannel will be concurrently filled with the data you send to it
    for x := range dataChannel {
        fmt.Println(x) // print the header or whatever you scraped from webpage
    }
    

    The goroutines will be functions which scrape websites and feed the dataChannel (you mentioned you know how to scrape websites already). Something like this:

    func fetchDataFromWebpage(url string, c chan string) {
        data := scrapeWebsite(url)
        c <- data // send the data to thread safe channel
    }
    

    If your having trouble understanding how to use concurrent tools, such as channels, mutex locks, or WaitGroups -- maybe you should start by trying to understand why concurrency can be problematic :) I find the best illustration of that (to me) is the Dining Philosophers Problem, https://en.wikipedia.org/wiki/Dining_philosophers_problem

    Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.

    Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when they have both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it is not being used by another philosopher. After an individual philosopher finishes eating, they need to put down both forks so that the forks become available to others. A philosopher can take the fork on their right or the one on their left as they become available, but cannot start eating before getting both forks.

    If practice is what you're looking for, I recommend implementing this problem, so that it fails, and then trying to fix it using concurrent patterns :) -- there are other problems like this available to! And creating the problem is one step towards understanding how to solve it!


    If you're having more trouble just understanding how to use Channels, aside from reading up on it, you can more simply think about channels as queues which can safely be accessed/modified from concurrent threads.

    评论

报告相同问题?

悬赏问题

  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)