dougou8639
dougou8639
2014-10-09 13:29

Amazon ElastiCache Memcached / Redis:将IP范围映射到国家/地区

I have a MySQL database that has an ip range (start and end, so two columns) and a country code (1 column). The database is used to look up the country based upon an ip address. It works but I want to speed it up more. An idea is to store the data on Amazon ElastiCache using e.g. Redis or Memcache. The problem I have is how would one go with this approach? Redis as well as Memcache uses key - values making it, in my opinion, difficult to store an IP range as well as the country code. What approach would you suggest for using ElastiCache Memcache or Redis?

The country range would be something like:

  • 192.168.1.1 - 192.168.1.100 (Country A)
  • 192.168.2.1 - 192.168.2.50 (Country B)
  • 192.168.1.150 - 192.168.1.200 (Country A)

Now I get the IP address e.g. 192.168.1.160, I need to look this up as fast as possible and return in this case Country A.

Looking forward to your ideas.

Marc

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

2条回答

  • drxm5014 drxm5014 4年前

    Just saw your question, even it is long ago that you asked, I have a solution proposal using Redis.

    Let's first try modelling the problem with some basic numbers (instead of IPs) and see how it can be solved:

    Range to Country Lookup

       Lookup |   Range    |     Country
      --------|------------+------------------
              |     5      |  begin:Country A
          L1 >>>           |
              |     10     |  end:Country A
              |            |
          L2 >>>           |
              |            |
          L2.1>>>   15     |  begin:Country B
              |            |
              |     20     |  end:Country B
          L3 >>>           |
              |            |
    

    Lookup L1:

    Make a number lookup between [6,10] (here inclusive range). In that case result will be end:Country A => IP Address belongs to Country A. Why we start with 6 will be obvious in L2.

    Lookup L2:

    Lookup a number in range [11, 15] (here inclusive range) . Result will be begin:Country B =>

    • IF Lookup L2.1
      => Looked Up Number points to the range start, i.e. begin:Country B
      => OK: iff IP belongs to begin:Country B value directly

    • ELSE ERROR: IP does not belong to any known range

    Lookup L3:

    Result will be Empty List or Set => ERROR: IP does not belong to any known range

    Insertion is more tricky!

    Inserting ranges must be taken care, because a newly inserted range might break an existing range. Here are the cases of insertion:

       Insert |   Range    |     Country
      --------|------------+------------------
              |     5      |  begin:Country A
              |            |
          I1 >>>    8,9    |  !!! Country C !!!
              |            |
              |     10     |  end:Country A
              |            |
              |            |
          I2 >>>    12,14  |  Country E
              |            |
              |            |
              |     15     |  begin:Country B
              |            |
          I3 >>>    17,21  |  !!! Country D !!!
              |            |
              |     20     |  end:Country B
              |            |
          I4 >>>    22,27  |  Country F
              |            |
    

    Insertion I1:

    Renders Addresses with IPs 6 and 7 (between 5 and 8) to be invalid. => Effectively Country A range shrinks to single IP Address 10.

    Insertion I2:

    OK, no range intersections

    Insertion I3:

    Renders begin of Country B invalid + renders begin of Country D (17..20) invalid

    Insertion I4:

    OK

    Note: Probably you will need to introduce range splitting logic in some of the cases.

    Redis based solution

    I'd propose to use Redis ZSET for that purpose. Here are the observations:

    1. Every IPv4 address can be represented as a 32bit integer besides the dotted decimal string representation.

    2. Redis ZSET guarantees uniqueness of the stored members by additionally ordering them with scores

    3. We can search for ZSET members by using a score range, i.e. ZRANGEBYSCORE command.

    If we use numeric IP as a ZSET score, we are done. Uniqueness of a country is enforced by prepending begin: and end: prefixes for particular range. In real life one country might have multiple IP address ranges, thus you will probably end up encoding the range number into the Country name, like begin:r1:Country A and end:r1:Country A. You could normalize this and introduce indirection. But to keep number of lookups low you'd want to denormalize it and have as much info as possible within single DB access. This is because introducing a new range is much less frequent than doing a lookup so increasing number of lookups will decrease performance.

       Lookup |   Score    |     Country
      --------|------------+------------------
              |     5      |  begin:Country A
          L1 >>>           |
              |     10     |  end:Country A
              |            |
          L2 >>>           |
              |            |
          L2.1>>>   15     |  begin:Country B
              |            |
              |     20     |  end:Country B
          L3 >>>           |
              |            |
    

    What Redis commands to use

    Here are just plain commands without your logic to check for error cases during insertions etc.

    • Adding a new range

      > ZADD ip-to-country 3232235777 "begin:Country A" 3232235876 "end:Country A"
      

      Note: 3232235777 is IPv4 192.168.1.1 represented as unsigned int, same applies to 192.168.1.100.

    • Checking to which range a particular IP belongs

      > ZRANGEBYSCORE ip-to-country 3232235778 +inf WITHSCORES LIMIT 0 1
      

      Note: 3232235778 is IPv4 192.168.1.2 represented as unsigned int and we make a lookup of one element (i.e. LIMIT 0 1) from 192.168.1.8 onwards (i.e. +inf).

    • Checking for Lookup 2.1 looked up IP starts the new range

       > ZSCORE ip-to-country "begin:Country A"
      

      Note: result will be 3232235777

    Complexity Analysis

    Space Complexity: If in worst case we end up with every single IP representing start and end of a Range, we will need O(2*N) space, where N is 2^32. But in real life this number will be much smaller. In some algorithm books you will see that 2^32 is considered a constant factor and thus will be reduced to O(1).

    Runtime Complexity: Redis states that ZRANGEBYSCORE is a O(log(N)+M) operation, where M is the number of elements in LIMIT, i.e. here only 1. If we have at most 2*2^32 scores in worst case than log2(8billion) is around 33 comparisons within Redis set implementation. But in reality I think there would not be more than 2 or 3 thousand ranges, which is around 11 comparisons. Redis states for a KEYS command:

    Redis running on an entry level laptop can scan a 1 million key database in 40 milliseconds.

    All in all your lookup will be blazing fast!

    点赞 评论 复制链接分享
  • dongluo3962 dongluo3962 7年前

    If you have a key per start/end range (e.g "80-255") and value of country code you could use either Memcached or Redis.

    If you want less keys you could use a sorted set in Redis where the key is the start range, the score is the end range and value is country code (might save you on memory too as Redis is more efficient at storing this stuff).

    点赞 评论 复制链接分享