I try to program a first google go program. I got this working part:
package main
import (
"fmt"
"os"
"regexp"
"github.com/PuerkitoBio/goquery"
"github.com/gocolly/colly"
)
func TrimSpaceNewlineInString(s string) string {
re := regexp.MustCompile(` +
+ +\t+`)
return re.ReplaceAllString(s, "")
}
func main() {
args := os.Args[1:]
c := colly.NewCollector()
c.OnHTML("tr",
func(e *colly.HTMLElement) {
ch := e.DOM.Children()
spalte1 := ch.Eq(0)
spalte2 := ch.Eq(1)
spalte1.Each(
func(_ int, s *goquery.Selection) {
fmt.Print(TrimSpaceNewlineInString(s.Text()), ":", TrimSpaceNewlineInString(spalte2.Text()))
})
})
c.Visit("https://deweysearchde.pansoft.de/webdeweysearch/executeSearch.html" +
"?lastScheduleRecord=669.1-669.7&lastTableRecord=&query=" + args[0] + "&_showShortNotations=off&catalogs=DNB&_catalogs=off&catalogs=GBV&_catalogs=off&catalogs=HeBIS&_catalogs=off&catalogs=SUB&_catalogs=off&catalogs=SWB&_catalogs=off&catalogs=FUB&_catalogs=off")
}
But I only what to get the 2nd column, if this is in the range form [0-9.-] and if so than I would need the following 3rd column with the DDC Classification of this DOM HTMLElement talbe. I would like to retrieve following
600;Technik
660;Chemische Verfahrenstechnik
669;Metallurgie
669.1-669.7;Metallurgie einzelner Metalle und deren Legierungen
669.1;Eisenmetalle
Can anyone here help me and tell me how it could be done with colly Colly doc for go, which is similar to jQuery?
PS: I have tried this way - with children. But the output looks like this. I do not know why.
Notation:Thema :
Haupttafeln
600:
Technik
660:
Chemische Verfahrenstechnik
661:
Industriechemikalien
661.2-661.6:
Säuren, Basen, Salze
661.5:
Ammoniumsalze
Notation:Thema :HilfstafelnT1--0:Hilfstafel 1. StandardschlüsselT2--0:Hilfstafel 2. Geo ...