根据出现次数对数字进行分组？

Given the following three sequences of numbers, I would like to figure out how to group the numbers to find the closest relations between them.

1,2,3,4
4,3,5
2,1,3
...

I'm not sure what the algorithm(s) I'm looking for are called, but we can see stronger relations with some of the numbers than with others.

These numbers appear together twice:

Together once:

So for example, we can see there must be a relationship between 1, 2, & 3 since they all appear together at least twice. You could also say that 3 & 4 are closely related since they also appear twice. However, the algorithm might pick [1,2,3] (over [3,4]) since it's a bigger grouping (more inclusive).

We can form any of the following groupings if we stick the numbers used most often together in a group:

[1,2,3] & [4,5]
[1,2]   & [3,4]   & [5]
[1,2]   & [3,4,5]
[1,2]   & [3,4]   & [5]

If duplicates are allowed, you could even end up with the following groups:

[1,2,3,4] [1,2,3] [3,4] [5]

I can't say which grouping is most "correct", but all four of these combos all find different ways of semi-correctly grouping the numbers. I'm not looking for a specific grouping - just a general cluster algorithm that works fairly well and is easy to understand.

I'm sure there are many other ways to use the occurrence count to group them as well. What would be a good base grouping algorithm for these? Samples in Go, Javascript, or PHP are preferred.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

dtjwov4984 2015-04-23 00:50

关注

As already been mentioned it's about clique. If you want exact answer you will face Maximum Clique Problem which is NP-complete. So all below make any sense only if alphabet of your symbols(numbers) has reasonable size. In this case strait-forward, not very optimised algorithm for Maximum Clique Problem in pseudo-code would be

Function Main
    Cm ← ∅                   // the maximum clique
    Clique(∅,V)              // V vertices set
    return Cm
End function Main

Function Clique(set C, set P) // C the current clique, P candidat set
    if (|C| > |Cm|) then
        Cm ← C
    End if
    if (|C|+|P|>|Cm|)then
        for all p ∈ P in predetermined order, do
            P ← P \ {p}
            Cp ←C ∪ {p}
            Pp ←P ∩ N(p)        //N(p) set of the vertices adjacent to p
            Clique(Cp,Pp)
        End for
    End if
End function Clique

Because of Go is my language of choice here is implementation

package main

import (
    "bufio"
    "fmt"
    "sort"
    "strconv"
    "strings"
)

var adjmatrix map[int]map[int]int = make(map[int]map[int]int)
var Cm []int = make([]int, 0)
var frequency int


//For filter
type resoult [][]int
var res resoult
var filter map[int]bool = make(map[int]bool)
var bf int
//For filter


//That's for sorting
func (r resoult) Less(i, j int) bool {
    return len(r[i]) > len(r[j])
}

func (r resoult) Swap(i, j int) {
    r[i], r[j] = r[j], r[i]
}

func (r resoult) Len() int {
    return len(r)
}
//That's for sorting


//Work done here
func Clique(C []int, P map[int]bool) {
    if len(C) >= len(Cm) {

        Cm = make([]int, len(C))
        copy(Cm, C)
    }
    if len(C)+len(P) >= len(Cm) {
        for k, _ := range P {
            delete(P, k)
            Cp := make([]int, len(C)+1)
            copy(Cp, append(C, k))
            Pp := make(map[int]bool)
            for n, m := range adjmatrix[k] {
                _, ok := P[n]
                if ok && m >= frequency {
                    Pp[n] = true
                }
            }
            Clique(Cp, Pp)

            res = append(res, Cp)
            //Cleanup resoult
            bf := 0
            for _, v := range Cp {
                bf += 1 << uint(v)
            }
            _, ok := filter[bf]
            if !ok {
                filter[bf] = true
                res = append(res, Cp)
            }
            //Cleanup resoult
        }
    }
}
//Work done here

func main() {
    var toks []string
    var numbers []int
    var number int


//Input parsing
    StrReader := strings.NewReader(`1,2,3
4,3,5
4,1,6
4,2,7
4,1,7
2,1,3
5,1,2
3,6`)
    scanner := bufio.NewScanner(StrReader)
    for scanner.Scan() {
        toks = strings.Split(scanner.Text(), ",")
        numbers = []int{}
        for _, v := range toks {
            number, _ = strconv.Atoi(v)
            numbers = append(numbers, number)

        }
        for k, v := range numbers {
            for _, m := range numbers[k:] {
                _, ok := adjmatrix[v]
                if !ok {
                    adjmatrix[v] = make(map[int]int)
                }
                _, ok = adjmatrix[m]
                if !ok {
                    adjmatrix[m] = make(map[int]int)
                }
                if m != v {
                    adjmatrix[v][m]++
                    adjmatrix[m][v]++
                    if adjmatrix[v][m] > frequency {
                        frequency = adjmatrix[v][m]
                    }
                }

            }
        }
    }
    //Input parsing

    P1 := make(map[int]bool)


    //Iterating for frequency of appearance in group
    for ; frequency > 0; frequency-- {
        for k, _ := range adjmatrix {
            P1[k] = true
        }
        Cm = make([]int, 0)
        res = make(resoult, 0)
        Clique(make([]int, 0), P1)
        sort.Sort(res)
        fmt.Print(frequency, "x-times ", res, " ")
    }
    //Iterating for frequency of appearing together
}

And here you can see it works https://play.golang.org/p/ZiJfH4Q6GJ and play with input data. But once more, this approach is for reasonable size alphabet(and input data of any size).

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(3条)

报告相同问题？

关注问题

根据出现次数对数字进行分组？
2015-04-18 17:53

回答 4 已采纳 As already been mentioned it's about clique. If you want exact answer you will face Maximum Clique
python一串字符中每个数字出现次数？ python
2020-12-17 21:17

回答 1 已采纳 a = input() for i in set(a): b = list(a).count(i) print(i, b)
求一个字符串中出现次数最多的数字之和? java
2019-08-18 19:27

回答 2 已采纳感觉这个时间复杂度和空间复杂度比较不好，暂时没有想到怎么优化或者好的想法，再看看有没有好的实现方法，不过这个功能按照你i描述的应该差不多，可以测试一下看看 ```java @Test publ
python正则数字重复出现次数_python正则表达式重复次数
2021-01-11 23:45

可里可笑的博客 1.如何使用正则表达式检查字符串中重复出现的词private void button1_Click(object sender, EventArgs e){System.Text.RegularExpressions.MatchCollection matches =//使用正则表达式查找重复出现单词的集合System....
判断区间内某个数字出现的次数 c++ c语言
2021-11-10 22:56

回答 2 已采纳 #include<stdio.h> int main() { int i,n,x,s=0,t; scanf("%d %d",&n,&x); for(i=1;i&lt
数字出现的次数刚需求解 c++
2022-03-13 17:24

回答 2 已采纳 #include <iostream> using namespace std; int main() { int n; cin>>n; int *
c语言统计数字出现次数刚学数组 c语言
2022-01-04 20:06

回答 3 已采纳 #include <stdio.h> int main() { int n; int num[10] = {0}; scanf("%d", &n); wh
剑指 Offer 56 - I. 数组中数字出现的次数
2022-07-28 06:04

Albert Edison的博客剑指 Offer 56 - I. 数组中数字出现的次数
c语言输出数字中出现次数最多的数字 c语言
2023-01-20 22:10

回答 1 已采纳您好，错误原因是条件考虑不全。题目第二行给定的数据均是非负整数，题主只考虑了s为正整数的情况，增加对0单独考虑的代码即可。增加的代码参考如下： if (s == 0) a[0]++;
数组内数字出现的次数 c语言有问必答
2021-11-10 21:08

回答 2 已采纳一道大一数组题 c语言-编程语言-CSDN问答 CSDN问答为您找到一道大一数组题 c语言相关问题答案，如果想了解更多关于一道大一数组题
c语言统计数字出现次数 c++ c语言
2023-02-06 22:30

回答 4 已采纳修改如下，供参考：
LeetCode刷题（81）～数组中数字出现的次数【分组异或】
2020-08-21 16:07

海轰Pro的博客一个整型数组 nums 里除两个数字之外，其他数字都出现了两次。请写程序找出这两个只出现一次的数字。要求时间复杂度是O(n)，空间复杂度是O(1)。示例 1：输入：nums = [4,1,4,6] 输出：[1,6] 或 [6,1] 示例 2： ...
统计字符串中数字字符出现次数 c++
2022-11-26 16:49

回答 2 已采纳把if判断句后面的分号去掉试下
数组中数字出现的次数
2021-04-16 16:47

想学习的弱鸡小白的博客数组中数字出现次数通用思路问题一：其他数字均出现2次，只有一个数字出现一次问题二：其他数字均出现2次，只有两个数字出现一次(LC)问题三：其他数字均出现三次，只有一个数字出现一次延申通用思路通常遇到“出现...
mysql 根据表的列的数据出现的次数，分组查询。
2019-02-11 10:46

小Superman的博客 select * from aa（表名） GROUP BY name（列名） HAVING count(*) >2 查询 aa的表中 name的这个列名字出现两次的数据
没有解决我的问题, 去提问

悬赏问题

¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来
¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 ubuntu子系统密码忘记
¥15 保护模式-系统加载-段寄存器
¥15 电脑桌面设定一个区域禁止鼠标操作
¥15 求NPF226060磁芯的详细资料

码龄粉丝数原力等级 --

根据出现次数对数字进行分组？

4条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

根据出现次数对数字进行分组？

4条回答 默认 最新

悬赏问题

4条回答默认最新