[RFC] Strawman replacement local directory state
John Arbash Meinel
john at arbash-meinel.com
Fri Jun 16 16:22:06 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
...
> The optimal work to do status on one file with this format is:
> * read the dirstate header to determine the number of parents.
> * bisect search into the file to find the record for the file, or its
> absence.
> * if we did not get the asked for file, report it as deleted.
> * parse the record if we found one.
> * stat the file on disk
> * calculate the sha1 or symlink target if it exists and the stat cache
> failed to match.
> * print the output.
>
> Thats log n seeks into the file, and parsing of (say) 300 bytes at each
> seek to find the record we have landed on. Call it a read of 1K at each
> seek point for nicety. Its even possible to consider a C extension for
> this that operates on a mmapped file with the delimiters for data
> structured to make the strings usable in situ.
That is, in fact, one nice consequence of using '\0' as the field delimiter.
I can also argue that other than jumping to the middle, and seeking
until you find '\n', you only have to read a little bit at each point.
Naturally if the average record size is 300bytes, you have to seek about
150bytes until you hit '\n'. And then read until you get the path. (Or
if we implement caching the number of '/' characters, maybe not even
that much).
>
> That said, we'll probably start with read the whole file, split on '\0',
> then bisect search to find the start point, and parse from there on.
I did switch the dirstate so the path is the first thing on each line.
So it is also entries[i][0], so hopefully bisect in the list in memory
can also be fast. I really wish that 'bisect.bisect()' would let you
pass in a keys function like sort() does. [Or at least a cmp function].
We may want to cache the number of slashes in the path in the memory
structure as well, since that also improves bisect performance.
>
> we'll also want a multi-bisect to search for N supplied paths
> concurrently.
>
> -Rob
>
That would be nice.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEksyeJdeBCYSNAAMRAt9kAJ9O2iZ3PWlPA53V2mWH2Giu9FAjwACfX0rS
A9a1Vb60nMoinlUfI/6AK+E=
=ffRh
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list