douhui9192 2018-06-14 21:58
浏览 51
已采纳

加权采样,无需使用gonum进行替换

I have a big array of items and another array of weights of the same size. I would like to sample without replacement from the first array based on the weights from the second array. Is there a way to do this using gonum?

  • 写回答

1条回答 默认 最新

  • dpndp64206 2018-06-14 22:54
    关注

    Weighted and its relative method .Take() look exactly like what you want.

    From the doc:

    func NewWeighted(w []float64, src *rand.Rand) Weighted
    

    NewWeighted returns a Weighted for the weights w. If src is nil, rand.Rand is used as the random source. Note that sampling from weights with a high variance or overall low absolute value sum may result in problems with numerical stability.

    func (s Weighted) Take() (idx int, ok bool)
    

    Take returns an index from the Weighted with probability proportional to the weight of the item. The weight of the item is then set to zero. Take returns false if there are no items remaining.

    Therefore Take is indeed what you need for sampling without replacement.

    You can use NewWeighted to create a Weighted with the given weights, then use Take to extract one index with probability based on the previously set weights, and then select the item at the extracted index from your array of samples.


    Working example:

    package main
    
    import (
        "fmt"
        "time"
    
        "golang.org/x/exp/rand"
    
        "gonum.org/v1/gonum/stat/sampleuv"
    )
    
    func main() {
        samples := []string{"hello", "world", "what's", "going", "on?"}
        weights := []float64{1.0, 0.55, 1.23, 1, 0.002}
    
        w := sampleuv.NewWeighted(
            weights,
            rand.New(rand.NewSource(uint64(time.Now().UnixNano())))
        )
    
        i, _ := w.Take()
    
        fmt.Println(samples[i])
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 Stata链式中介效应代码修改
  • ¥15 latex投稿显示click download
  • ¥15 请问读取环境变量文件失败是什么原因?
  • ¥15 在若依框架下实现人脸识别
  • ¥15 添加组件无法加载页面,某块加载卡住
  • ¥15 网络科学导论,网络控制
  • ¥15 利用Sentinel-2和Landsat8做一个水库的长时序NDVI的对比,为什么Snetinel-2计算的结果最小值特别小,而Lansat8就很平均
  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错