[RFC] Strawman replacement local directory state

Robert Collins robertc at robertcollins.net
Tue Jun 13 17:36:32 BST 2006


On Wed, 2006-06-14 at 02:29 +1000, Michael Ellerman wrote:
> 
> On the wiki you quote hg, which reads a Dirstate for 19470 files in
> 220ms. If it scales linearly that'd mean ~23.6 seconds just to read
> the state for my ~2 million file repository. Am I reading that right? 

Right. Thus the emphasis in this format on being able to stream the
data: we can read the entire file, utf8 decode it and string split -
which should be extremely fast. (or possible, open it as utf8, then
start for line in file:..

with that data flowing in we can start stating the files we need to stat
immediately. The sort-by-dirblock in the file means we can stat the
files in the order they are listed in that file, giving near-optimal
order for the stats (on many file systems), possibly concurrently with
doing readdir on the directories to find ignored and unknown files (i.e.
for status).

This means that even if it does take 23.6 seconds to parse the entire
file, the cost can be spread over the entire status operation.

That said, I know that hg is now trying some improvements that give it a
20% odd improvement, and I have not yet benchmarked this format - its
possible it will be faster. Even if its not, if there is no faster way
in python, we can always add C for the critical inner section.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060614/6ffa923f/attachment.pgp 


More information about the bazaar mailing list