douyan1970 2015-04-18 17:53
浏览 42
已采纳

根据出现次数对数字进行分组?

Given the following three sequences of numbers, I would like to figure out how to group the numbers to find the closest relations between them.

1,2,3,4
4,3,5
2,1,3
...

I'm not sure what the algorithm(s) I'm looking for are called, but we can see stronger relations with some of the numbers than with others.

These numbers appear together twice:

1 & 2
1 & 3
2 & 3
3 & 4

Together once:

1 & 4
2 & 4
3 & 5
4 & 5

So for example, we can see there must be a relationship between 1, 2, & 3 since they all appear together at least twice. You could also say that 3 & 4 are closely related since they also appear twice. However, the algorithm might pick [1,2,3] (over [3,4]) since it's a bigger grouping (more inclusive).

We can form any of the following groupings if we stick the numbers used most often together in a group:

[1,2,3] & [4,5]
[1,2]   & [3,4]   & [5]
[1,2]   & [3,4,5]
[1,2]   & [3,4]   & [5]

If duplicates are allowed, you could even end up with the following groups:

[1,2,3,4] [1,2,3] [3,4] [5]

I can't say which grouping is most "correct", but all four of these combos all find different ways of semi-correctly grouping the numbers. I'm not looking for a specific grouping - just a general cluster algorithm that works fairly well and is easy to understand.

I'm sure there are many other ways to use the occurrence count to group them as well. What would be a good base grouping algorithm for these? Samples in Go, Javascript, or PHP are preferred.

  • 写回答

4条回答 默认 最新

  • dtjwov4984 2015-04-23 00:50
    关注

    As already been mentioned it's about clique. If you want exact answer you will face Maximum Clique Problem which is NP-complete. So all below make any sense only if alphabet of your symbols(numbers) has reasonable size. In this case strait-forward, not very optimised algorithm for Maximum Clique Problem in pseudo-code would be

    Function Main
        Cm ← ∅                   // the maximum clique
        Clique(∅,V)              // V vertices set
        return Cm
    End function Main
    
    Function Clique(set C, set P) // C the current clique, P candidat set
        if (|C| > |Cm|) then
            Cm ← C
        End if
        if (|C|+|P|>|Cm|)then
            for all p ∈ P in predetermined order, do
                P ← P \ {p}
                Cp ←C ∪ {p}
                Pp ←P ∩ N(p)        //N(p) set of the vertices adjacent to p
                Clique(Cp,Pp)
            End for
        End if
    End function Clique
    

    Because of Go is my language of choice here is implementation

    package main
    
    import (
        "bufio"
        "fmt"
        "sort"
        "strconv"
        "strings"
    )
    
    var adjmatrix map[int]map[int]int = make(map[int]map[int]int)
    var Cm []int = make([]int, 0)
    var frequency int
    
    
    //For filter
    type resoult [][]int
    var res resoult
    var filter map[int]bool = make(map[int]bool)
    var bf int
    //For filter
    
    
    //That's for sorting
    func (r resoult) Less(i, j int) bool {
        return len(r[i]) > len(r[j])
    }
    
    func (r resoult) Swap(i, j int) {
        r[i], r[j] = r[j], r[i]
    }
    
    func (r resoult) Len() int {
        return len(r)
    }
    //That's for sorting
    
    
    //Work done here
    func Clique(C []int, P map[int]bool) {
        if len(C) >= len(Cm) {
    
            Cm = make([]int, len(C))
            copy(Cm, C)
        }
        if len(C)+len(P) >= len(Cm) {
            for k, _ := range P {
                delete(P, k)
                Cp := make([]int, len(C)+1)
                copy(Cp, append(C, k))
                Pp := make(map[int]bool)
                for n, m := range adjmatrix[k] {
                    _, ok := P[n]
                    if ok && m >= frequency {
                        Pp[n] = true
                    }
                }
                Clique(Cp, Pp)
    
                res = append(res, Cp)
                //Cleanup resoult
                bf := 0
                for _, v := range Cp {
                    bf += 1 << uint(v)
                }
                _, ok := filter[bf]
                if !ok {
                    filter[bf] = true
                    res = append(res, Cp)
                }
                //Cleanup resoult
            }
        }
    }
    //Work done here
    
    func main() {
        var toks []string
        var numbers []int
        var number int
    
    
    //Input parsing
        StrReader := strings.NewReader(`1,2,3
    4,3,5
    4,1,6
    4,2,7
    4,1,7
    2,1,3
    5,1,2
    3,6`)
        scanner := bufio.NewScanner(StrReader)
        for scanner.Scan() {
            toks = strings.Split(scanner.Text(), ",")
            numbers = []int{}
            for _, v := range toks {
                number, _ = strconv.Atoi(v)
                numbers = append(numbers, number)
    
            }
            for k, v := range numbers {
                for _, m := range numbers[k:] {
                    _, ok := adjmatrix[v]
                    if !ok {
                        adjmatrix[v] = make(map[int]int)
                    }
                    _, ok = adjmatrix[m]
                    if !ok {
                        adjmatrix[m] = make(map[int]int)
                    }
                    if m != v {
                        adjmatrix[v][m]++
                        adjmatrix[m][v]++
                        if adjmatrix[v][m] > frequency {
                            frequency = adjmatrix[v][m]
                        }
                    }
    
                }
            }
        }
        //Input parsing
    
        P1 := make(map[int]bool)
    
    
        //Iterating for frequency of appearance in group
        for ; frequency > 0; frequency-- {
            for k, _ := range adjmatrix {
                P1[k] = true
            }
            Cm = make([]int, 0)
            res = make(resoult, 0)
            Clique(make([]int, 0), P1)
            sort.Sort(res)
            fmt.Print(frequency, "x-times ", res, " ")
        }
        //Iterating for frequency of appearing together
    }
    

    And here you can see it works https://play.golang.org/p/ZiJfH4Q6GJ and play with input data. But once more, this approach is for reasonable size alphabet(and input data of any size).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料