在golang中实现全文搜索的有效方法

I am trying to realize a simple full text search in golang but all my implementations turn out to be too slow to overcome the thresholds.

The task is as follows:

Documents are non-empty strings of lowercase words divided by spaces
Each document has an implicit identifier equal to its index in the input array
New() constructs the index
Search(): accepts a query, which is also a string of lowercase words divided by spaces, and returns a sorted array of unique identifiers of documents that contains all words from the query regardless of their order

Example:

index := New([]string{
"this is the house that jack built",  //: 0
"this is the rat that ate the malt",  //: 1
})

index.Search("")  // -> []
index.Search("in the house that jack built")  // -> []
index.Search("malt rat")  // -> [1]
index.Search("is this the")  // -> [0, 1]

I have already tried to implement:

a binary search tree for each document and for all documents all together
a trie (prefix tree) for each document and for all documents all together
inverted index search

binary search tree (for all documents):

type Tree struct {
    m           map[int]bool
    word        string
    left        *Tree
    right       *Tree
}

type Index struct {
    tree *Tree
}

binary search tree (a tree for each document):

type Tree struct {
    word  string
    left  *Tree
    right *Tree
}

type Index struct {
    tree  *Tree
    index int
    next  *Index
}

trie (for all documents):

type Trie struct {
    m        map[uint8]*Trie
    end_node map[int]bool
}

type Index struct {
    trie *Trie
}

trie (for each document):

type Trie struct {
    m        map[uint8]*Trie
    end_node bool
}

type Index struct {
    trie  *Trie
    index int
    next  *Index
}

inverted index:

type Index struct {
    m map[string]map[int]bool
}

New and Search implementation for inverted index:

// New creates a fulltext search index for the given documents
func New(docs []string) *Index {
    m := make(map[string]map[int]bool)

    for i := 0; i < len(docs); i++ {
        words := strings.Fields(docs[i])
        for j := 0; j < len(words); j++ {
            if m[words[j]] == nil {
                m[words[j]] = make(map[int]bool)
            }
            m[words[j]][i+1] = true
        }
    }
    return &(Index{m})
}

// Search returns a slice of unique ids of documents that contain all words from the query.
func (idx *Index) Search(query string) []int {
    if query == "" {
        return []int{}
    }
    ret := make(map[int]bool)
    arr := strings.Fields(query)
    fl := 0
    for i := range arr {
        if idx.m[arr[i]] == nil {
            return []int{}
        }
        if fl == 0 {
            for value := range idx.m[arr[i]] {
                ret[value] = true
            }
            fl = 1
        } else {
            tmp := make(map[int]bool)
            for value := range ret {
                if idx.m[arr[i]][value] == true {
                    tmp[value] = true
                }
            }
            ret = tmp
        }
    }
    ret_arr := []int{}
    for value := range ret {
        ret_arr = append(ret_arr, value-1)
    }
    sort.Ints(ret_arr)
    return ret_arr
}

Am I doing something wrong or is there a better algorithm for search in golang?

Any help is appreciated.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

drci47425 2019-04-08 23:19

关注

I can't really help you for the language specific part, but if it's of any help, here is a pseudocode that describes a Trie implementation along with a function to solve your current problem in a decently efficient manner.

struct TrieNode{
    map[char] children      // maps character to children
    set[int] contains       // set of all ids of documents that contain the word
}

// classic search function in trie, except it returns a set of document ids instead of a simple boolean
function get_doc_ids(TrieNode node, string w, int depth){
    if (depth == length(w)){
        return node.contains
    } else {
        if (node.hasChild(w[depth]) {
            return get_doc_ids(node.getChild(w[depth], w, depth+1)
        } else {
            return empty_set()
        }
    }
}

// the answering query function, as straight forward as it can be
function answer_query(TrieNode root, list_of_words L){
    n = length(L)
    result = get_docs_ids(root, L[0], 0)
    for i from 1 to n-1 do {
        result = intersection(result, get_docs_ids(root, L[i], 0))  // set intersection 
        if (result.is_empty()){
            break  // no documents contains them all, no need to check further
        }
    }
    return result
}

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

在Golang中实现模板方法模式的优雅方法 c++
2016-06-04 22:48

回答 3 已采纳 Logger embeds a pointer which will be nil when you allocate the struct. That's because embedding d
在golang中实现接口给方法有指针接收器
2017-09-19 17:51

回答 2 已采纳 If your interface is declared like this: type Person interface { BasicInfo() MemberBasicInfo
从Golang的父级实现方法中调用子级方法
2017-12-11 17:41

回答 2 已采纳 Your issue here is that Tick() is not defined on your BehaviorTree structure. As a result, when y
Go：GoLang中实现的算法
2021-02-04 17:50

在Go语言，也被称为GoLang，中实现的算法是一门重要的技术主题，它涉及到计算机科学的基础和编程实践。GoLang由于其简洁、高效和并发特性，成为实现算法的理想选择，尤其适合开发高性能服务和系统。以下是一些在Go中...
在golang中实现io.ReadWriteSeeker
2018-03-25 14:51

回答 1 已采纳 For example, using the same underlying array for (uBuf and zBuf) buffers, package main import (
在Golang中实现相同接口的不同结构
2019-04-12 20:38

回答 1 已采纳 As long as your DataCircle and DataRectangle structs implement the Shape interface you will be abl
如何在Golang中动态调用结构的所有方法？ [重复]
2019-09-10 00:27

回答 1 已采纳 To call the method from the type, you must pass a reflect.Value instance that provides the Foo obj
ldap:golang 的 LDAPv3 实现
2021-05-31 00:38

在Golang中实现LDAPv3，通常会使用第三方库，如`ldap`。这个库提供了连接到LDAP服务器、执行搜索、修改、添加、删除操作等功能。文件名"ldap-master"可能是指一个Git仓库，其中包含了完整的`ldap`库源代码，便于...
如何在Golang中模拟第三方包的方法
2019-05-31 08:31

回答 1 已采纳 Every time I'm wondering "how to mock a method", this is mostly related to my code architecture. N
如何在Golang中实现内存池
2016-07-21 13:36

回答 2 已采纳 Note beforehand: Many suggest to use sync.Pool which is a fast, good implementation for temporary
如何在Golang中实现高效的内存键值存储
2016-04-16 06:30

回答 2 已采纳 From the same repo you linked in your question, there is also an implementation of sharding strate
DataStructureByGo:使用golang实现常用的数据结构
2021-03-18 17:07

本项目"DataStructureByGo"正是基于Golang实现了一系列常用的数据结构，旨在帮助开发者深入理解数据结构，并能够熟练地应用到实际编程中。首先，让我们来探讨Golang中的基本数据类型，如int、float、string等，...
在Golang中快速复制对象的更快方法 json
2017-10-17 12:31

回答 1 已采纳 JSON vs gob difference The encoding/gob package needs to transmit type definitions: The imple
goGoogleSearch:Golang中的Google搜索爬虫
2021-05-14 22:20

标题 "goGoogleSearch:Golang中的Google搜索爬虫" 暗示了这是一个用Go语言编写的程序，其主要功能是实现对Google搜索引擎的自动化搜索和爬取。在Go语言中，这种爬虫通常利用HTTP请求库，如`net/http`，以及HTML解析...
算法和数据结构（golang语言实现）
2023-04-15 01:48

theo.wu的博客 1.1 如何实现链表的逆序。
没有解决我的问题, 去提问

悬赏问题

¥15 网络科学导论，网络控制
¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同
¥50 如何openEuler 22.03上安装配置drbd
¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
¥15 无线连接树莓派，无法执行update，如何解决？（相关搜索：软件下载）

码龄粉丝数原力等级 --

在golang中实现全文搜索的有效方法

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

在golang中实现全文搜索的有效方法

1条回答 默认 最新

悬赏问题

1条回答默认最新