duanqiang9212 2017-01-21 18:52
浏览 10
已采纳

如何同时搜索一大堆地图[字符串]字符串

I need to search a huge slice of maps[string]string. My thought was that this is a good chance for go's channel and go routines.

The Plan was to divide the slice in parts and send search them in parallel. But I was kind of shocked that my parallel version timed out while the search of the whole slice did the trick.

I am not sure what I am doing wrong. Down below is my code which I used to test the concept. The real code would involve more complexity

//Search for a giving term
//This function gets the data passed which will need to be search
//and the search term and it will return the matched maps
// the data is pretty simply the map contains { key: andSomeText }
func Search(data []map[string]string, term string) []map[string]string {

    set := []map[string]string{}

    for _, v := range data {
        if v["key"] == term {

            set = append(set, v)
        }

    }
    return set

}

So this works pretty well to search the slice of maps for a given SearchTerm.

Now I thought if my slice would have like 20K entries, I would like to do the search in parallel

// All searches all records concurrently
// Has the same function signature as the the search function
// but the main task is to fan out the slice in 5 parts and search
// in parallel
func All(data []map[string]string, term string) []map[string]string {
    countOfSlices := 5

    part := len(data) / countOfSlices

    fmt.Printf("Size of the data:%v
", len(data))
    fmt.Printf("Fragemnt Size:%v
", part)

    timeout := time.After(60000 * time.Millisecond)

    c := make(chan []map[string]string)

    for i := 0; i < countOfSlices; i++ {
        // Fragments of the array passed on to the search method
        go func() { c <- Search(data[(part*i):(part*(i+1))], term) }()

    }

    result := []map[string]string{}

    for i := 0; i < part-1; i++ {
        select {
        case records := <-c:
            result = append(result, records...)
        case <-timeout:
            fmt.Println("timed out!")
            return result
        }
    }
    return result
}

Here are my tests:

I have a function to generate my test data and 2 tests.

func GenerateTestData(search string) ([]map[string]string, int) {
    rand.Seed(time.Now().UTC().UnixNano())
    strin := []string{"String One", "This", "String Two", "String Three", "String Four", "String Five"}
    var matchCount int
    numOfRecords := 20000
    set := []map[string]string{}
    for i := 0; i < numOfRecords; i++ {
        p := rand.Intn(len(strin))
        s := strin[p]
        if s == search {
            matchCount++
        }
        set = append(set, map[string]string{"key": s})
    }
    return set, matchCount
}

The 2 tests: The first just traverses the slice and the second searches in parallel

func TestSearchItem(t *testing.T) {

    tests := []struct {
        InSearchTerm string
        Fn           func(data []map[string]string, term string) []map[string]string
    }{
        {
            InSearchTerm: "This",
            Fn:           Search,
        },
        {InSearchTerm: "This",
            Fn: All,
        },
    }

    for i, test := range tests {

        startTime := time.Now()
        data, expectedMatchCount := GenerateTestData(test.InSearchTerm)
        result := test.Fn(data, test.InSearchTerm)

        fmt.Printf("Test: [%v]:
Time: %v 

", i+1, time.Since(startTime))
        assert.Equal(t, len(result), expectedMatchCount, "expected: %v to be: %v", len(result), expectedMatchCount)

    }
}

It would be great if someone could explain me why my parallel code is so slow? What is wrong with the code and what I am missing here as well as what the recommended way would be to search huge slices in memory 50K+.

  • 写回答

3条回答 默认 最新

  • duanhuan5409 2017-01-21 19:46
    关注

    This looks like just a simple typo. The problem is that you divide your original big slice into 5 pieces (countOfSlices), and you properly launch 5 goroutines to search each part:

    for i := 0; i < countOfSlices; i++ {
        // Fragments of the array passed on to the search method
        go func() { c <- Search(data[(part*i):(part*(i+1))], term) }()
    
    }
    

    This means you should expect 5 results, but you don't. You expect 4000-1 results:

    for i := 0; i < part-1; i++ {
        select {
        case records := <-c:
            result = append(result, records...)
        case <-timeout:
            fmt.Println("timed out!")
            return result
        }
    }
    

    Obviously if you only launched 5 goroutines, each of which delivers 1 single result, you can only expect as many (5). And since your loop waits a lot more (which will never come), it times out as expected.

    Change the condition to this:

    for i := 0; i < countOfSlices; i++ {
        // ...
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图