I am trying to stream (a lot) of user agents through a GO (Golang) program to extract different information about these ua agents like device type, OS, etc.
The GO code in Tobie Langel's UA Parser Repo looks very promising:
https://github.com/tobie/ua-parser/tree/master/go/uaparser
I created a simple program, in which I basically add streaming functionality to the example on the README page. To compare performance, I created the same type of simple program with a Ruby gem that uses a similar approach and same regexes.yaml file.
https://github.com/toolmantim/user_agent_parser
After compiling the Go program and testing both, the Ruby version is running 2-3 times faster than the GO version.
As far as I can see, both programs are loading and processing the ua agents in a similar manner.
I am new to GO and am wondering if anyone sees any major optimizations or fixes that could make programs using the GO portion of this repo run faster.
I am also interested to know if anyone knows of any other GO libraries I can use to parse user agents that work well.
---TESTING SIMPLE PROGRAMS TO COMPARE REGEX VS PCRE LIBS (as suggested in the comments below)
I have created the programs below, one using PCRE and one using the standard regex library. However, I don't seem to be getting a performance boost with PCRE. In fact, the PCRE library seems to be a little slower. Am I approaching this the wrong way?
--With standard regex library
package main
import (
"fmt"
"regexp"
"strings"
"bufio"
"os"
)
func main() {
var regex = regexp.MustCompile(`Mac`)
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
line := scanner.Text()
fields := strings.Split(line, "\t")
fmt.Println(regex.FindIndex([]byte(fields[0])))
}
}
--With PCRE library
package main
import (
"fmt"
pcre "github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre"
"bufio"
"os"
"strings"
)
func main() {
scanner:= bufio.NewScanner(os.Stdin)
var regex = pcre.MustCompile(`Mac`, 0)
for scanner.Scan() {
line := scanner.Text()
fields := strings.Split(line, "\t")
fmt.Println(regex.FindIndex([]byte(fields[0]),0))
}
}