Cache DNS queries
John Arbash Meinel
john at arbash-meinel.com
Fri May 26 22:05:02 BST 2006
Hi everybody- (Hi Dr. Nick)
I was looking into the DNS querying issues because someone just
submitted a bug on it. And I tracked down that each HTTP connection
calls socket.getaddrinfo, which involves a full DNS query. Which turns
out to be a major bottleneck.
So I wrote a plugin which just monkey patches socket.getaddrinfo so that
it caches the results in a dictionary.
The plugin is available here:
http://bzr.arbash-meinel.com/plugins/dns_cache/
The core of the code is just:
import socket
_host_to_addrinfo = {}
def getaddrinfo(host, port, *args, **kwargs):
key = (host, port, args, tuple(sorted(kwargs.items())))
if key not in _host_to_addrinfo:
_host_to_addrinfo[key] = _getaddrinfo(host, port,
*args, **kwargs)
return _host_to_addrinfo[key]
After turning off my local named server, the time to 'bzr log' the last
10 revisions of http://bazaar-vcs.org/bzr/bzr.dev takes 5 minutes on my
machine. With the above cache, it takes 30 seconds.
I really think we should consider including something like this into bzr
core. I wrote it as a plugin, because I don't know how official we want
it to be.
These are my timing tests:
pycurl:
real 5m11.349s
caching and pycurl:
real 4m53.515s
plain urllib:
real 4m48.735s
caching and urllib:
real 0m32.386s
I don't know what pycurl is doing that is taking so long, but since it
is an C extension, I don't know that we can monkey patch it. For now, my
dns_cache plugin just re-registers urllib so that it gets preference.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060526/30ce414f/attachment.pgp
More information about the bazaar
mailing list