douke8473 2017-10-09 07:18
浏览 42
已采纳

确保我的Go页面浏览量计数器没有被滥用

I believe I have a found a very good and fast solution for efficiently counting page views:

Working example in go playground here: https://play.golang.org/p/q_mYEYLa1h

My idea is to push this to the database every X minutes, and after pushing a key then delete it from the page map.

My question now is, what would be the optimal way to ensure that this isn't abused? Ideally, I would only want to increase page count from the same person if there was a time interval of 2 hours since last visiting the page. As far as I know, it would be ideal to store and compare both IP and user agent (I don't want to rely on cookie/localstorage), but I'm not quite sure how to efficiently store and compare this information.

I'd likely get both the IP (req.Header.Get("x-forwarded-for")) and UserAgent (req.UserAgent()) from http.Request.

I was thinking making a visitor struct similar to my page struct that would look like this:

type visitor struct {
    mutex          sync.Mutex
    urlIPUAAndTime map[string]time
}

This way should make it possible to do something similar to before. However, imagine if the website had so many requests that there would be hundreds of millions of unique visitor maps being stored, and each of these could only be deleted after 2 (or more) hours. I therefore think this is not a good solution.

I guess it would be ideal/necessary to write to and read from some file, but not sure how this should be done efficiently. Help would be greatly appreciated

  • 写回答

1条回答 默认 最新

  • duancong7358 2017-10-09 08:01
    关注

    One of optimization ways is to add a Bloom filter before this map. Bloom filter is a probabilistic structure which can say one of these:

    • this user is definitely new

    • and this user possibly was here

    This is a way to cut off computation on early stage. If many of your users are new then you save requests to database to check all of them. What if structure says "user is possibly non-unique"? Then you go the database and check it. Here's one more optimization: if you do not need very accurate information and can agree with mistake about several percent, you may use the sole bloom filter. I guess many large sites use this technique for estimation.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 运动想象脑电信号数据集.vhdr
  • ¥15 三因素重复测量数据R语句编写,不存在交互作用
  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab
  • ¥20 重新写的代码替换了之后运行hbuliderx就这样了
  • ¥100 监控抖音用户作品更新可以微信公众号提醒
  • ¥15 UE5 如何可以不渲染HDRIBackdrop背景
  • ¥70 2048小游戏毕设项目