Just saw your question, even it is long ago that you asked, I have a solution proposal using Redis.
Let's first try modelling the problem with some basic numbers (instead of IPs) and see how it can be solved:
Range to Country Lookup
Lookup | Range | Country --------|------------+------------------ | 5 | begin:Country A L1 >>> | | 10 | end:Country A | | L2 >>> | | | L2.1>>> 15 | begin:Country B | | | 20 | end:Country B L3 >>> | | |
Make a number lookup between
[6,10] (here inclusive range). In that case result will be
end:Country A => IP Address belongs to Country A. Why we start with
6 will be obvious in
Lookup a number in range [11, 15] (here inclusive range) . Result will be
begin:Country B =>
=> Looked Up Number points to the range start, i.e.
iffIP belongs to begin:Country B value directly
ELSEERROR: IP does not belong to any known range
Result will be
Empty List or Set => ERROR: IP does not belong to any known range
Insertion is more tricky!
Inserting ranges must be taken care, because a newly inserted range might break an existing range. Here are the cases of insertion:
Insert | Range | Country --------|------------+------------------ | 5 | begin:Country A | | I1 >>> 8,9 | !!! Country C !!! | | | 10 | end:Country A | | | | I2 >>> 12,14 | Country E | | | | | 15 | begin:Country B | | I3 >>> 17,21 | !!! Country D !!! | | | 20 | end:Country B | | I4 >>> 22,27 | Country F | |
Renders Addresses with IPs
8) to be invalid. => Effectively
Country A range shrinks to single IP Address
OK, no range intersections
Renders begin of Country B invalid + renders begin of Country D (
Note: Probably you will need to introduce range splitting logic in some of the cases.
Redis based solution
I'd propose to use Redis ZSET for that purpose. Here are the observations:
Every IPv4 address can be represented as a 32bit integer besides the dotted decimal string representation.
Redis ZSET guarantees uniqueness of the stored members by additionally ordering them with scores
We can search for ZSET members by using a score range, i.e.
If we use numeric IP as a ZSET score, we are done. Uniqueness of a country is enforced by prepending
end: prefixes for particular range. In real life one country might have multiple IP address ranges, thus you will probably end up encoding the range number into the Country name, like
begin:r1:Country A and
end:r1:Country A. You could normalize this and introduce indirection. But to keep number of lookups low you'd want to denormalize it and have as much info as possible within single DB access. This is because introducing a new range is much less frequent than doing a lookup so increasing number of lookups will decrease performance.
Lookup | Score | Country --------|------------+------------------ | 5 | begin:Country A L1 >>> | | 10 | end:Country A | | L2 >>> | | | L2.1>>> 15 | begin:Country B | | | 20 | end:Country B L3 >>> | | |
What Redis commands to use
Here are just plain commands without your logic to check for error cases during insertions etc.
Adding a new range
> ZADD ip-to-country 3232235777 "begin:Country A" 3232235876 "end:Country A"
192.168.1.1represented as unsigned int, same applies to
Checking to which range a particular IP belongs
> ZRANGEBYSCORE ip-to-country 3232235778 +inf WITHSCORES LIMIT 0 1
192.168.1.2represented as unsigned int and we make a lookup of one element (i.e.
LIMIT 0 1) from
Lookup 2.1looked up IP starts the new range
> ZSCORE ip-to-country "begin:Country A"
Note: result will be
Space Complexity: If in worst case we end up with every single IP representing start and end of a Range, we will need
O(2*N) space, where N is
2^32. But in real life this number will be much smaller. In some algorithm books you will see that
2^32 is considered a constant factor and thus will be reduced to
Runtime Complexity: Redis states that
ZRANGEBYSCORE is a
O(log(N)+M) operation, where
M is the number of elements in
LIMIT, i.e. here only 1. If we have at most
2*2^32 scores in worst case than
log2(8billion) is around
33 comparisons within Redis set implementation. But in reality I think there would not be more than 2 or 3 thousand ranges, which is around
11 comparisons. Redis states for a
Redis running on an entry level laptop can scan a 1 million key database in 40 milliseconds.
All in all your lookup will be blazing fast!