TL;DR: peterSO is 10X faster than strom73.
strom73:
$ go build strom73.go && ./strom73 "/usr/share/dict/words" "restful"
Found anagrams: fluster,restful,
2019/02/26 02:50:47 anagrams took 150.733904ms
$
peterSO:
$ go build peterso.go && ./peterso "/usr/share/dict/words" "restful"
Found anagrams: [restful fluster]
2019/02/26 02:50:52 anagrams took 15.093098ms
$
How to optimize finding anagrams from a text file in Go
I am trying to solve a coding challenge where one must print all
anagrams from a text file matching the input string. Program must
execute as fast as possible.
Currently, I have it down to about 160ms.
No test cases are provided.
The XY problem is asking about your attempted solution rather than your actual problem: The XY Problem.
If we look at Wikipedia - Anagram we see: An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. Examples, "restful" = "fluster", "funeral" = "real fun", "rail safety" = "fairy tales".
To solve this problem in Go, we use the Go testing package benchmark facility to measure performance. For example, a sum of letter values, a sort of letter values, and the overall algorithm. We relentlessly dissect each line of code for performance. We order the tests for anagrams starting with the cheapest. For example, sorting letters is expensive, so we first check the number of letters, then a simple sum of letters to filter out many non-anagrams cheaply.
We need something to act as an anagram text file. The Linux word dictionary file (/usr/share/dict/words
) is readily available, although it is limited to single words. It uses upper- and lower-case.
Exhaustive benchmarking is exhausting. The law of dimminishing returns has set in. A ten-fold improvement in speed is good enough for now.
peterso.go:
package main
import (
"bufio"
"fmt"
"io"
"log"
"os"
"sort"
"strings"
"time"
)
func findAnagrams(find string, text io.Reader) []string {
find = strings.ToLower(find)
findSum := 0
findRunes := []rune(find)
j := 0
for i, r := range findRunes {
if r != ' ' {
findSum += int(r)
if i != j {
findRunes[j] = r
}
j++
}
}
findRunes = findRunes[:j]
sort.Slice(findRunes, func(i, j int) bool { return findRunes[i] < findRunes[j] })
findStr := string(findRunes)
anagrams := []string{find}
s := bufio.NewScanner(text)
for s.Scan() {
word := strings.ToLower(s.Text())
wordSum := 0
wordRunes := []rune(word)
j := 0
for i, r := range wordRunes {
if r != ' ' {
wordSum += int(r)
if i != j {
wordRunes[j] = r
}
j++
}
}
wordRunes = wordRunes[:j]
if len(wordRunes) != len(findRunes) {
continue
}
if wordSum != findSum {
continue
}
sort.Slice(wordRunes, func(i, j int) bool { return wordRunes[i] < wordRunes[j] })
if string(wordRunes) == findStr {
if word != find {
anagrams = append(anagrams, word)
}
}
}
if err := s.Err(); err != nil {
panic(err)
}
return anagrams
}
func timeTrack(start time.Time, name string) {
elapsed := time.Since(start)
log.Printf("%s took %s", name, elapsed)
}
func main() {
defer timeTrack(time.Now(), "anagrams")
textPath := os.Args[1]
findWord := os.Args[2]
text, err := os.Open(textPath)
if err != nil {
panic(err)
}
defer text.Close()
anagrams := findAnagrams(findWord, text)
fmt.Println("Found anagrams: ", anagrams)
}