doutou3725 2011-04-16 11:47
浏览 56
已采纳

从Mysql迁移到Cassandra

Previously I was using the class found here to convert userID to some random string.

From his blog:

Running:

alphaID(9007199254740989);

will return 'PpQXn7COf' and:

alphaID('PpQXn7COf', true);

will return '9007199254740989'

So the idea was that users could do www.mysite.com/user/PpQXn7COf and i convert that to a normal integer so i could do in mysql

"Select * from Users where userID=".alphaID('PpQXn7COf', true)

Now i'm just started working with Cassandra an i'm looking for some replacement.

  1. I want url like www.mysite.com/user/PpQXn7COf not like www.mysite.com/user/username1
  2. The "PpQXn7COf" uuid must be as short as possible.

In the Twissandra example explained here: http://www.rackspace.com/cloud/blog/2010/05/12/cassandra-by-example/

They create some long uuid (i guess it is so long because then its almost 100 percent sure its random).

In mysql i just had a userID column with auto increasement so when i used the alphaID() function i always got a very short random string.

Anyone an idea how to solve this as clean as possible?


Edit:

It is used for a social media site so it must be persistent. Thats also why i don't want to use usernames/realnames in urls, user cant remain google undetected if they need.

I just got a simple idea, however i don't know how scalable it is

<?php
//createUUID() makes +- 14 char string with A-Z a-z 1-0 based on micro/milli/nanoseconds
while(get_count(createUUID()) > 0){//uuid  is unique
  //insert username pass, uuid etc into cassandra
  if($result == "1"){
      header('Location: http://www.mysite.com/usercenter');
  }else{
      echo "error";
  }
}
?>

When this gets the size of lets say twitter/facebook:

  1. Will it execute in acceptable time?
  2. Will it still generate unique uuid fast enough so if 10000 users/second are registering it isnt cluttering up?
  • 写回答

1条回答 默认 最新

  • doukanzhuo4297 2011-04-16 14:03
    关注

    Auto-increments are not suitable for a robust distributed system. You can only assign a unique ID if every node in your system is available, to ensure it's unique.

    You can of course, invent your own unique-id generator, but you must then ensure that it will generate unique IDs anywhere in your infrastructure.

    For example, each node can just have a file which it (with suitable locking etc) just increments, but you will also need to ensure that they don't clash - for instance, by having the server ID included in the generation algorithm.

    This may be operationally nontrivial - your ops engineers will need to ensure that all the servers in the infrastructure are configured correctly with their own ID generators set up so that they don't generate the same ID. However, it's possible.

    UUIDs are the reasonable alternative, because they will definitely be unique.

    A UUID is 128 bits; if we store 6 bits per character (i.e. base64) then that takes 22 characters, which is quite a long URI. If you want it shorter, you will need to generate unique IDs a different way.

    Plus it all depends on "how unique" you actually need your IDs to be. If your IDs can safely be reused after a few months, you can probably do it in < 60 bits (depending also on the number of servers in your infrastructure, and how frequently you need to generate them).

    We use

    • Server ID
    • Time (granularity = 2 seconds), but wraps after a few months
    • A per-server counter (which wraps frequently, but not within 2 seconds)

    And stick all the bits together. This generates an ID which is < 64 bits long, but is guaranteed to be unique for the length of time it needs to be (which in our case is only a couple of months)


    Our algorithm will malfunction and generate a duplicate ID if:

    • The system clock on one of our nodes goes backwards by the same amount of time in which the counter wraps.
    • Our operations engineers make a mistake and assign the same server ID to two servers.
    • Eventually, after about 9 months.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥30 自适应 LMS 算法实现 FIR 最佳维纳滤波器matlab方案
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行
  • ¥20 测距传感器数据手册i2c
  • ¥15 RPA正常跑,cmd输入cookies跑不出来
  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动