dirstate _bisect_recursive and paths2ids

Robert Collins robertc at robertcollins.net
Sun Feb 25 20:47:05 GMT 2007


On Sun, 2007-02-25 at 10:25 -0600, John Arbash Meinel wrote:
> Over the weekend I finished implementing a recursive bisect function,
> which is capable of finding all records for a set of paths (just like
> paths2ids). It grabs all records underneath directories, and returns
> both sides of a rename.
> 
> I'm not convinced that the api is perfect, because the data requires a
> bit of massaging to work with the other apis. And while it returns the
> entries it reads, the apis would really expect the DirState object
> itself to hang on to the records. So I need to come up with a way to
> track what records have been read, and where they have been found, so
> that we can use them in future calls. It will also make the bisect
> recursive function a little faster since it won't bisect the same rows
> over and over again.
> 
> However, it is potentially a great win. Just as a point of comparison, I
> can use it to find all entries underneath the 'netwerk' directory, which
> is 699 records.
..
> 
> So while _bisect_recursive won't help yet (because everything else will
> want to load the whole dirstate first), it has the potential to save us
> a *lot* of time when handling small subsets of large trees.

Yup - it will be good once we are onto the true optimisation phase of
dirstate. I have /nearly/ finished the _iter_changes API update and
implementation for dirstate,  which allows status to not generate an
inventory at all - and removes the use of paths2ids for the status code
path : so that we dont take ids into any api during status, thus not
needing to read the entire dirstate.

> PS> I'm trying hard to stop working on bisect and just get DirState
> passing all tests, so I'm not planning on working on bisect for a while.
> But I thought I would get it to a basic functionality, which also gives
> us an idea of the performance we could get out of it.

I think the basic idea is that the various _get_index functions should
be the ones that either go to disk, or at some threshold load
everything. I think we need a new memory state: state._dirblock_state =
PARTIALLY_IN_MEMORY.

It may be time soon to switch the actual management of blocks into a
separate class that is used by dirstate, it would clean it up a bit I
think, and can probably be done tastefully without performance
implications.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070226/7f8f5f39/attachment.pgp 


More information about the bazaar mailing list