doujiao1984 2016-11-04 19:50
浏览 49
已采纳

将平面列表加权为正态分布

I have list of string items of any length, I need to "normalize" this list so that each item is part of a normal distribution, appending the weight to the string.

What is more effective and mathematical/statistical way to go about this other than what I have below?

func normalizeAppend(in []string, shuffle bool) []string {
    var ret []string

    if shuffle {
        shuffleStrings(in)
    }

    l := len(in)
    switch {
    case remain(l, 3) == 0:
        l3 := (l / 3)
        var low, mid, high []string
        for i, v := range in {
            o := i + 1
            switch {
            case o <= l3:
                low = append(low, v)
            case o > l3 && o <= l3*2:
                mid = append(mid, v)
            case o >= l3*2:
                high = append(high, v)
            }
        }

        q1 := 1600 / len(low)
        q2 := 6800 / len(mid)
        q3 := 1600 / len(high)

        for _, v := range low {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
        }

        for _, v := range mid {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
        }

        for _, v := range high {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
        }
    case remain(l, 2) == 0 && l >= 4:
        l4 := (l / 4)
        var first, second, third, fourth []string
        for i, v := range in {
            o := i + 1
            switch {
            case o <= l4:
                first = append(first, v)
            case o > l4 && o <= l4*2:
                second = append(second, v)
            case o > l4*2 && o <= l4*3:
                third = append(third, v)
            case o > l4*3:
                fourth = append(fourth, v)
            }
        }
        q1 := 1600 / len(first)
        q2 := 3400 / len(second)
        q3 := 3400 / len(third)
        q4 := 1600 / len(fourth)

        for _, v := range first {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
        }

        for _, v := range second {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
        }

        for _, v := range third {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
        }

        for _, v := range fourth {
            ret = append(ret, fmt.Sprintf("%s_%d", v, q4))
        }
    default:
        var first, second, third []string
        q1 := (1 + math.Floor(float64(l)*.16))
        q3 := (float64(l) - math.Floor(float64(l)*.16))
        var o float64
        for i, v := range in {
            o = float64(i + 1)
            switch {
            case o <= q1:
                first = append(first, v)
            case o > q1 && o < q3:
                second = append(second, v)
            case o >= q3:
                third = append(third, v)
            }
        }
        lq1 := 1600 / len(first)
        lq2 := 3400 / len(second)
        lq3 := 1600 / len(third)
        for _, v := range first {
            ret = append(ret, fmt.Sprintf("%s_%d", v, lq1))
        }

        for _, v := range second {
            ret = append(ret, fmt.Sprintf("%s_%d", v, lq2))
        }

        for _, v := range third {
            ret = append(ret, fmt.Sprintf("%s_%d", v, lq3))
        }

    }

    return ret
}

Some requested clarification:

I have a list of items that will chosen from the list many times one at a time by weighted selection, to start with I have a list with (implied) weights of 1:

[a_1, b_1, c_1, d_1, e_1, f_1, g_1, h_1, i_1, j_1, k_1]

I'm looking for a better way to make that list into something producing a more 'normal' distribution of weighting for selection:

[a_1, b_2, c_3, d_5, e_14, f_30, g_14, h_5, i_3, j_2, k_1]

or perhaps it is likely I need to change my methods to something more grounded statistically. Bottom line is I want to control selection from a list of items in many ways, one of which here is ensuring that items are returned in way approximating a normal curve.

  • 写回答

1条回答 默认 最新

  • duanji6997 2016-11-05 13:30
    关注

    If you just want to calculate the weights for a given list, then you need the following things:

    • The mean of the normal distribution
    • The variance of the normal distribution
    • A discretizer for the values

    The first one is quite simple. You want the mean to be in the center of the list. Therefore (assuming zero-based indexing):

    mean = (list.size - 1) / 2
    

    The second is kind of arbitrary and depends on how steep you want your weights to fall off. Weights of the normal distribution are practically zero beyond a distance of 3 * standard_deviation from the mean. So a good standard deviation in most cases is probably something between a fourth and a sixth list length:

    standard_deviation = (1/4 .. 1/6) * list.size
    variance = standard_deviation^2
    

    Assuming that you want integer weights, you need to discretize the weights from the normal distribution. The easiest way to do this is by specifying the maximum weight (of the element at the mean position).

    That's it. The weight for an element at position i is then:

    weight[i] = round(max_weight * exp(-(i - mean)^2 / (2 * variance)))
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。
  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的