Cache DNS queries

John Arbash Meinel john at arbash-meinel.com
Fri May 26 22:05:02 BST 2006


Hi everybody- (Hi Dr. Nick)

I was looking into the DNS querying issues because someone just
submitted a bug on it. And I tracked down that each HTTP connection
calls socket.getaddrinfo, which involves a full DNS query. Which turns
out to be a major bottleneck.

So I wrote a plugin which just monkey patches socket.getaddrinfo so that
it caches the results in a dictionary.

The plugin is available here:
http://bzr.arbash-meinel.com/plugins/dns_cache/

The core of the code is just:

import socket

_host_to_addrinfo = {}

def getaddrinfo(host, port, *args, **kwargs):
    key = (host, port, args, tuple(sorted(kwargs.items())))
    if key not in _host_to_addrinfo:
        _host_to_addrinfo[key] = _getaddrinfo(host, port,
            *args, **kwargs)
    return _host_to_addrinfo[key]

After turning off my local named server, the time to 'bzr log' the last
10 revisions of http://bazaar-vcs.org/bzr/bzr.dev takes 5 minutes on my
machine. With the above cache, it takes 30 seconds.

I really think we should consider including something like this into bzr
core. I wrote it as a plugin, because I don't know how official we want
it to be.

These are my timing tests:


pycurl:
real    5m11.349s
caching and pycurl:
real    4m53.515s

plain urllib:
real    4m48.735s
caching and urllib:
real    0m32.386s

I don't know what pycurl is doing that is taking so long, but since it
is an C extension, I don't know that we can monkey patch it. For now, my
dns_cache plugin just re-registers urllib so that it gets preference.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060526/30ce414f/attachment.pgp 


More information about the bazaar mailing list