Using regexp is usually slower than doing it manually. Since the task is not complex, the non-regexp solution isn't complicated either.
You may use strings.FieldsFunc()
to split a string on a set of characters, and strings.TrimSpace()
to strip off leading and trailing white-spaces.
Here's a simple function doing what you want:
func split(s, sep string) (tokens []string) {
fields := strings.FieldsFunc(s, func(r rune) bool {
return strings.IndexRune(sep, r) != -1
})
for _, s2 := range fields {
s2 = strings.TrimSpace(s2)
if s2 != "" {
tokens = append(tokens, s2)
}
}
return
}
Testing it:
fmt.Printf("%q
", split("a,b;c, de; ; fg ", ",;"))
fmt.Printf("%q
", split("a[b]c[ de/ / fg ", "[]/"))
Output (try it on the Go Playground):
["a" "b" "c" "de" "fg"]
["a" "b" "c" "de" "fg"]
Improvements
If performance is an issue and you have to call this split()
function many times, it would be profitable to create a set-like map from the separator characters, and reuse that, so inside the function passed to strings.FieldFunc()
, you can simply check if the rune
is in this map, so you would not need to call strings.IndexRune()
to decide if the given rune
is a separator character.
The performance gain might not be significant if you have few separator characters (like 1-3 characters), but if you would have a lot more, using a map could significantly improve performance.
This is how it could look like:
var (
sep1 = map[rune]bool{',': true, ';': true}
sep2 = map[rune]bool{'[': true, ']': true, '/': true}
)
func split(s string, sep map[rune]bool) (tokens []string) {
fields := strings.FieldsFunc(s, func(r rune) bool {
return sep[r]
})
for _, s2 := range fields {
s2 = strings.TrimSpace(s2)
if s2 != "" {
tokens = append(tokens, s2)
}
}
return
}
Testing it:
fmt.Printf("%q
", split("a,b;c, de; ; fg ", sep1))
fmt.Printf("%q
", split("a[b]c[ de/ / fg ", sep2))
Output is the same. Try this one on the Go Playground.