Just saw your question, even it is long ago that you asked, I have a solution proposal using Redis.
Let's first try modelling the problem with some basic numbers (instead of IPs) and see how it can be solved:
Range to Country Lookup
Lookup | Range | Country
--------|------------+------------------
| 5 | begin:Country A
L1 >>> |
| 10 | end:Country A
| |
L2 >>> |
| |
L2.1>>> 15 | begin:Country B
| |
| 20 | end:Country B
L3 >>> |
| |
Lookup L1
:
Make a number lookup between [6,10]
(here inclusive range). In that case result will be end:Country A
=> IP Address belongs to Country A. Why we start with 6
will be obvious in L2
.
Lookup L2
:
Lookup a number in range [11, 15] (here inclusive range) . Result will be begin:Country B
=>
IF
Lookup L2.1
=> Looked Up Number points to the range start, i.e. begin:Country B
=> OK: iff
IP belongs to begin:Country B value directly
ELSE
ERROR: IP does not belong to any known range
Lookup L3
:
Result will be Empty List or Set
=> ERROR: IP does not belong to any known range
Insertion is more tricky!
Inserting ranges must be taken care, because a newly inserted range might break an existing range. Here are the cases of insertion:
Insert | Range | Country
--------|------------+------------------
| 5 | begin:Country A
| |
I1 >>> 8,9 | !!! Country C !!!
| |
| 10 | end:Country A
| |
| |
I2 >>> 12,14 | Country E
| |
| |
| 15 | begin:Country B
| |
I3 >>> 17,21 | !!! Country D !!!
| |
| 20 | end:Country B
| |
I4 >>> 22,27 | Country F
| |
Insertion I1
:
Renders Addresses with IPs 6
and 7
(between 5
and 8
) to be invalid. => Effectively Country A
range shrinks to single IP Address 10
.
Insertion I2
:
OK, no range intersections
Insertion I3
:
Renders begin of Country B invalid + renders begin of Country D (17
..20
) invalid
Insertion I4
:
OK
Note: Probably you will need to introduce range splitting logic in some of the cases.
Redis based solution
I'd propose to use Redis ZSET for that purpose. Here are the observations:
Every IPv4 address can be represented as a 32bit integer besides the dotted decimal string representation.
Redis ZSET guarantees uniqueness of the stored members by additionally ordering them with scores
We can search for ZSET members by using a score range, i.e. ZRANGEBYSCORE
command.
If we use numeric IP as a ZSET score, we are done. Uniqueness of a country is enforced by prepending begin:
and end:
prefixes for particular range. In real life one country might have multiple IP address ranges, thus you will probably end up encoding the range number into the Country name, like begin:r1:Country A
and end:r1:Country A
. You could normalize this and introduce indirection. But to keep number of lookups low you'd want to denormalize it and have as much info as possible within single DB access. This is because introducing a new range is much less frequent than doing a lookup so increasing number of lookups will decrease performance.
Lookup | Score | Country
--------|------------+------------------
| 5 | begin:Country A
L1 >>> |
| 10 | end:Country A
| |
L2 >>> |
| |
L2.1>>> 15 | begin:Country B
| |
| 20 | end:Country B
L3 >>> |
| |
What Redis commands to use
Here are just plain commands without your logic to check for error cases during insertions etc.
-
Adding a new range
> ZADD ip-to-country 3232235777 "begin:Country A" 3232235876 "end:Country A"
Note: 3232235777
is IPv4 192.168.1.1
represented as unsigned int, same applies to 192.168.1.100
.
-
Checking to which range a particular IP belongs
> ZRANGEBYSCORE ip-to-country 3232235778 +inf WITHSCORES LIMIT 0 1
Note: 3232235778
is IPv4 192.168.1.2
represented as unsigned int and we make a lookup of one element (i.e. LIMIT 0 1
) from 192.168.1.8
onwards (i.e. +inf
).
-
Checking for Lookup 2.1
looked up IP starts the new range
> ZSCORE ip-to-country "begin:Country A"
Note: result will be 3232235777
Complexity Analysis
Space Complexity: If in worst case we end up with every single IP representing start and end of a Range, we will need O(2*N)
space, where N is 2^32
. But in real life this number will be much smaller. In some algorithm books you will see that 2^32
is considered a constant factor and thus will be reduced to O(1)
.
Runtime Complexity: Redis states that ZRANGEBYSCORE
is a O(log(N)+M)
operation, where M
is the number of elements in LIMIT
, i.e. here only 1. If we have at most 2*2^32
scores in worst case than log2(8billion)
is around 33
comparisons within Redis set implementation. But in reality I think there would not be more than 2 or 3 thousand ranges, which is around 11
comparisons. Redis states for a KEYS
command:
Redis running on an entry level laptop can scan a 1 million key database in 40 milliseconds.
All in all your lookup will be blazing fast!