通过Goroutines运行多个GTK WebKitWebViews

I'm using Go with the gotk3 and webkit2 libraries to try and build a web crawler that can parse JavaScript in the context of a WebKitWebView.

Thinking of performance, I'm trying to figure out what would be the best way to have it crawl concurrently (if not in parallel, with multiple processors), using all available resources.

GTK and everything with threads and goroutines are pretty new to me. Reading from the gotk3 goroutines example, it states:

Native GTK is not thread safe, and thus, gotk3's GTK bindings may not be used from other goroutines. Instead, glib.IdleAdd() must be used to add a function to run in the GTK main loop when it is in an idle state.

Go will panic and show a stack trace when I try to run a function, which creates a new WebView, in a goroutine. I'm not exactly sure why this happens, but I think it has something to do with this comment. An example is shown below.

Current Code

Here's my current code, which has been adapted from the webkit2 example:

package main

import (
    "fmt"
    "github.com/gotk3/gotk3/glib"
    "github.com/gotk3/gotk3/gtk"
    "github.com/sourcegraph/go-webkit2/webkit2"
    "github.com/sqs/gojs"
)

func crawlPage(url string) {
    web := webkit2.NewWebView()

    web.Connect("load-changed", func(_ *glib.Object, i int) {
        loadEvent := webkit2.LoadEvent(i)

        switch loadEvent {
        case webkit2.LoadFinished:
            fmt.Printf("Load finished for: %v
", url)

            web.RunJavaScript("window.location.hostname", func(val *gojs.Value, err error) {
                if err != nil {
                    fmt.Println("JavaScript error.")
                } else {
                    fmt.Printf("Hostname (from JavaScript): %q
", val)
                }

                //gtk.MainQuit()
            })
        }
    })

    glib.IdleAdd(func() bool {
        web.LoadURI(url)
        return false
    })
}

func main() {
    gtk.Init(nil)

    crawlPage("https://www.google.com")
    crawlPage("https://www.yahoo.com")
    crawlPage("https://github.com")
    crawlPage("http://deelay.me/2000/http://deelay.me/img/1000ms.gif")

    gtk.Main()
}

It seems that creating a new WebView for each URL allows them to load concurrently. Having glib.IdleAdd() running in a goroutine, as per the gotk3 example, doesn't seem to have any effect (although I'm only doing a visual benchmark):

go glib.IdleAdd(func() bool { // Works
    web.LoadURI(url)
    return false
})

However, trying to create a goroutine for each crawlPage() call ends in a panic:

go crawlPage("https://www.google.com") // Panics and shows stack trace

I can run web.RunJavaScript() in a goroutine without issue:

        switch loadEvent {
        case webkit2.LoadFinished:
            fmt.Printf("Load finished for: %v
", url)

            go web.RunJavaScript("window.location.hostname", func(val *gojs.Value, err error) { // Works
                if err != nil {
                    fmt.Println("JavaScript error.")
                } else {
                    fmt.Printf("Hostname (from JavaScript): %q
", val)
                }

                //gtk.MainQuit()
            })
        }

Best Method?

The current methods I can think of are:

Spawn new WebViews to crawl each page, as shown in the current code. Track how many WebViews are opened and either continually delete and create new ones, or reuse a set number created initially, to where all available resources on the machine are used. Would this be limited in terms of processor cores being used?
Basic idea of #1, but running the binary multiple times (instead of one gocrawler process running on the machine, have four) to utilize all cores/resources.
Run the GUI (gtk3) portion of the app in its own goroutine. I could then pass data to other goroutines which do their own heavy processing, such as searching through content.

What would actually be the best way to run this code concurrently, if possible, and max out performance?

Update

Method 1 and 2 are probably out of the picture, as I ran a test by spawning ~100 WebViews and they seem to load synchronously.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

报告相同问题？

关注问题

pycharm人脸检测运行不通过 python 有问必答
2021-07-24 17:50

回答 2 已采纳由于初学，自己在电脑的cmd里下载了一次opencv的库，又用pycharm里的解释器中下载了一次opencv的库，可能是冲突了。我把cmd下载的opencv卸载后，又重装软件就好了
PHP GTK刷新GUI php
2016-08-04 00:12

回答 1 已采纳 Since you are trying to do work at the same time as your GUI is running, you will need to use a se
gtk+里面label的问题，求教大神
2016-01-12 01:42

回答 1 已采纳 http://blog.chinaunix.net/uid-199788-id-2856462.html
linux GTK 并发多进程 3个窗口
2012-02-24 20:40

一个C程序，使用Linux下的GTK图形库，分窗口显示三个并发进程的运行。
知道如何在Go-GTK中使用GTK-Parasite吗？
2014-12-03 06:52

回答 1 已采纳 Ok, it didn't worked with https://github.com/mattn/go-gtk package, but turns out it worked as expe
如何在Win 7上安装gtk
2013-06-08 12:24

回答 2 已采纳 It finally works. Here's a step by step guide to getting gtk to work on windows 32bit with go. I
Gtk4初始化启动环境#GTK4# c语言
2021-11-12 13:13

回答 1 已采纳 #include <gtk/gtk.h> static void activate (GtkApplication* app, gpointer us
GTK开发（一）创建第一个窗口
2021-12-24 14:35

清道夫的博客 GTK开发（一）创建第一个窗口文章目录GTK开发（一）创建第一个窗口前言一、gtk是什么？二、gtk使用1.创建第一个·gtk窗口2. 窗口的相关设置2. 1.窗口相关设置详解2. 2.窗口设置示例总结前言这个系列将介绍与gtk...
gtk_progress_bar_pulse，怎么用？
2015-10-04 00:10

回答 1 已采纳使用方法，参考链接:http://www.newsmth.net/nForum/#!article/LinuxDev/3396
gtk+图形编程的时候出现的错误，怎么破？invalid cast
2015-09-24 13:51

回答 1 已采纳人家需要一个notebook，你传了一个button，检查下你的函数参数，传错了。
如何使用gtk显示ffmpeg读取到的avframe c++ c语言 golang r语言开发语言
2019-10-06 22:36

回答 2 已采纳 https://blog.csdn.net/zhangpengzp/article/details/89531572
GTK入门教程
2022-10-15 22:06

IT_阿水的博客最初是为GIMP写的，已成为一个功能强大、设计灵活的一个通用图形库，是GNU/Linux下开发图形界面的应用程序的主流开发工具之一。当然，GTK也是支持跨平台的，支持Unix类的系统、Windows，甚至手机平台。
linux gtk 里面的怎么将滚动条构建添加到viewport？ linux
2015-09-18 15:20

回答 1 已采纳 http://linux.chinaunix.net/techdoc/develop/2007/06/14/960171.shtml
Windows环境下使用GTK
2022-11-09 19:33

QQVQQ...的博客本文详细介绍VS2020如何配置GTK开发环境，文中已包含编译好的GTK库，可直接使用，当然你也可以自行编译。
GTK3的配置
2021-11-15 15:02

小陌白的博客超级详细的在windows环境下使用Visual Studio2019引入gtk3库教程
GTK3官方离线文档_离线文档
2014-12-22 17:14

这个是GTK3的离线文档,其实就是一个HTML页,解压以后直接运行 index.html就好了！
GTK多窗口的创建及窗口之间数据的交互
2014-06-11 19:44

bearing_bear的博客主要参考了这篇文章：一步一步学GTK+之多窗口，
GTK开发（二）控件和布局
2021-12-26 12:49

清道夫的博客 GTK开发（二）控件和布局提示：这里可以添加系列文章的所有文章的目录，目录需要自己手动添加例如：第一章 Python 机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助...
学习 gtk4
2021-11-21 17:15

提及的博客学个屁，gtk4 的改动有很多问题。很不稳定。反正 windows 上最好不要。还是老老实实的使用 gtk3 吧。来看看bug 不想去研究这个程序的代码看哪里出现了未释放的资源。重复切换 page/1/2/3，你就能看到内存的...
没有解决我的问题, 去提问

悬赏问题

¥15 R语言Rstudio突然无法启动
¥15 关于#matlab#的问题：提取2个图像的变量作为另外一个图像像元的移动量，计算新的位置创建新的图像并提取第二个图像的变量到新的图像
¥15 改算法，照着压缩包里边，参考其他代码封装的格式写到main函数里
¥15 用windows做服务的同志有吗
¥60 求一个简单的网页(标签-安全|关键词-上传)
¥35 lstm时间序列共享单车预测，loss值优化，参数优化算法
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值

码龄粉丝数原力等级 --

通过Goroutines运行多个GTK WebKitWebViews

Current Code

Best Method?

Update

0条回答默认最新

悬赏问题

通过Goroutines运行多个GTK WebKitWebViews

Current Code

Best Method?

Update

0条回答 默认 最新

悬赏问题

0条回答默认最新