Monday, December 29, 2008

Publicly linkable resources without an integer id

For those of us who are writing Python web applications with RDBMS storage it is very tempting to create urls like http://hackingthought.com/foo/72 where the '72' represents an id to a record in the a table. This has never felt right for the basic security risk of I could easy guess the other records which I may not want publicly exposed. Example 73 or 74. For many reasons this is undesirable. In the case of a blog this is an acceptable behavior because we may not be concerned with the resources that are on the other side. In the case of video site like youtube I may not want just anyone downloading the entire collection of video content!
This is pretty easy to resolve by using Pythons built in uuid module. For a while I thought this was acceptable but then I started to think the urls that it would generate would be rather long. Example: hackingthought.com/foo/181067910632484385564896804811492956458! To me this is a bit excessive to use the integer uuid generates. To make my life a lot easier there is a built in representation of the uuid which is hex. hackingthought.com/foo/8838693e35534e86b442f9d8b8d6192a. Better but only saved 7 characters.
Even though 32 characters is not that painful it is still a bit long for my taste. hex is not ideal, base64 would use a lot more characters and reduce the size. This would produce: hackingthought.com/foo/iDhpPjVTToa0QvnYuNYZKg (after subtracting ==\n).
At this point I have gotten it down to 22 characters vs 32 hex or 39 integer. 43% Reduction!
The code:
import uuid
from base64 import b64encode
import timeit
k = uuid.uuid4()
# Unique id as an int
print('char length: %s type:int value: %s'
% (len(str(k.int)), k.int))
# Unique id as an hex
print('char length: %s type:hex value: %s'
% (len(str(k.hex)), k.hex))
# Unique as base 64
# Notice it replaces the + and / for chars that can work in a url
b64k = b64encode(k.bytes, '#$')
# Subtract formatting chars
b64k = b64k.replace('=', '').strip()
print('char length: %s type:hex value: %s'
% (len(str(b64k)), b64k))
Performance? Well my laptop can generate 100K in about 6 seconds using uuid1 or uuid4 so I don't think that it is a bottleneck.
I wonder if it could be even shorter. I would love to know how to make them shorter without loosing any ease of use of built in python modules or writing of algorithms. Please drop me a line of you find another a better way to do this.