Smart revision fetching update
Andrew Bennetts
andrew at canonical.com
Thu Aug 9 07:27:57 BST 2007
Hi all,
Thanks to a dose of the flu I didn't get support for transferring revision data
efficiently over the smart server protocol ready in time for 0.19 (0.90). The
good news is it will have most of the 0.91 cycle to get settled and nicely
polished.
Of the code in my http://people.ubuntu.com/~andrew/bzr/repo-refactor branch,
about 2/3rds has been extracted into independent branches and reviewed already,
so I'll land that as soon as 0.90 opens. The remaining 1/3rd basically adds a
Repository.fetch_revisions smart method and teaches the client side to use it,
and I'll send that to the list within a day.
The work to be done from there:
* Add a specialised smart method for the initial pull case. At the moment
with my code the initial pull of a branch retrieves the branch's ancestry,
and then sends a big Repository.fetch_revisions call explicitly listing
every revision. So for branching bzr at the moment it would send over 12000
revision IDs across the wire, the same ones it just downloaded. So probably
I'll add a “Repository.fetch_all_revisions” smart request that just takes
the tip revision ID instead. This will make this code a strict improvement
over the current tarball hack.
* Write a smart method for pushing revisions. This is basically symmetrical
with the pull case. A “Repository.add_revisions” request with a body of a
revision data stream.
* Write fallbacks for transfers between different format repositories, where
the raw knit data can't just be blindly copied. Perhaps we need a new set
of parameterised tests for this?
* Measure, and probably fix, memory consumption. I expect at the moment my
code is buffering the entire request/response bodies in memory, which is Not
Good. It would be nice to have something similar to the benchmark test
suite that measures the memory high-water mark of various operations. Or
perhaps just set a limit with or ulimit/setrusage, and then feed in a data
set larger than the limit: if it trips the limit, the memory consumption
needs fixing. A quick and dirty hack would be to limit requests to e.g.
100 revisions at a time, but I think we can fix this properly.
And, of course, test this as much as possible in real usage! :)
After that, I expect we'll want to take a close look at logs of various
operations with the -Dhpss flag on, and see if there's other low-hanging fruit
to fix.
People that are keen are welcome to checkout the “repo-refactor” branch above
and start testing already. It should already make a noticeable difference on
pulling, although I haven't yet checked to see how much.
-Andrew.
More information about the bazaar
mailing list