dirstate _bisect_recursive and paths2ids
John Arbash Meinel
john at arbash-meinel.com
Sun Feb 25 21:14:38 GMT 2007
Robert Collins wrote:
> On Sun, 2007-02-25 at 10:25 -0600, John Arbash Meinel wrote:
>> Over the weekend I finished implementing a recursive bisect function,
>> which is capable of finding all records for a set of paths (just like
>> paths2ids). It grabs all records underneath directories, and returns
>> both sides of a rename.
>>
>> I'm not convinced that the api is perfect, because the data requires a
>> bit of massaging to work with the other apis. And while it returns the
>> entries it reads, the apis would really expect the DirState object
>> itself to hang on to the records. So I need to come up with a way to
>> track what records have been read, and where they have been found, so
>> that we can use them in future calls. It will also make the bisect
>> recursive function a little faster since it won't bisect the same rows
>> over and over again.
>>
>> However, it is potentially a great win. Just as a point of comparison, I
>> can use it to find all entries underneath the 'netwerk' directory, which
>> is 699 records.
> ..
>> So while _bisect_recursive won't help yet (because everything else will
>> want to load the whole dirstate first), it has the potential to save us
>> a *lot* of time when handling small subsets of large trees.
>
> Yup - it will be good once we are onto the true optimisation phase of
> dirstate. I have /nearly/ finished the _iter_changes API update and
> implementation for dirstate, which allows status to not generate an
> inventory at all - and removes the use of paths2ids for the status code
> path : so that we dont take ids into any api during status, thus not
> needing to read the entire dirstate.
>
>> PS> I'm trying hard to stop working on bisect and just get DirState
>> passing all tests, so I'm not planning on working on bisect for a while.
>> But I thought I would get it to a basic functionality, which also gives
>> us an idea of the performance we could get out of it.
>
> I think the basic idea is that the various _get_index functions should
> be the ones that either go to disk, or at some threshold load
> everything. I think we need a new memory state: state._dirblock_state =
> PARTIALLY_IN_MEMORY.
>
> It may be time soon to switch the actual management of blocks into a
> separate class that is used by dirstate, it would clean it up a bit I
> think, and can probably be done tastefully without performance
> implications.
>
> -Rob
>
Yeah, I think something like that could be useful. Though it would also
be fairly easy to implement _iter_changes() in terms of
_bisect_recursive(), so it wouldn't have to load anything into memory
either. However, I can see how that might lead to 2 code paths, one
which has everything loaded and one which didn't. I suppose an easier
alternative could be to change _bisect_recursive to read from memory if
it is available. But I don't know that it is the "correct" method.
And as for "passing all tests" I've been working on move() and I did
find a big gaping hole. Which is moving a directory doesn't yet move all
the children of the directory. I also found a few bugs in move, but so
far the test suite is looking pretty good in that regard. I just need to
get this part working, and I think we'll have covered everything for
move(). (It is a pretty difficult portion, but I think I'm getting a
handle on it).
John
=:->
More information about the bazaar
mailing list