[RFC] Stop using hpss 'Repository.get_revision_graph()'

John Arbash Meinel john at arbash-meinel.com
Wed Aug 1 15:53:32 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Tue, 2007-07-31 at 16:14 -0500, John Arbash Meinel wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> In doing some of the debug logging changes for hpss, I came across an
>> interesting performance issue.
>>
>> Specifically, during 'bzr commit' with a bound branch, we read all of
>> revisions.kndx (1,121,084 bytes 23ms) and inventory.kndx (1,175,920 bytes 25ms).
>>
>> We then end up doing an RPC Repository.get_revision_graph (1,567,374 bytes,
>> 1000ms). We actually do this 2 times (second times is 904ms).
> 
> That sounds like a bug.
> 
>> Now, my server is a bit on the slower side (700MHz PIII). So the time may be a
>> bit slower than most people see.
>>
>> Anyway, if we just eliminate the RemoteRepo.get_revision_graph() specialized
>> function, we would eliminate 3MB of data transfer when committing on a bound
>> branch. (And for me, it would also eliminate 2s of commit time).
> 
> OTOH you'll degrade things for andrews work when it lands, and mine.
> 
> Mine because it won't be reading the full index to access revision
> texts, andrews because it won't be reading remote indexes at all to
> perform pushes.
> 
> -Rob
> 

Except get_revision_graph() is defined to read the entire revision graph, and
return the whole thing.

And it turns out that .kndx is a 50% smaller format for transmitting a revision
graph that get_revision_graph() is. (.kndx is ~1.1MB, get_revision_graph
~1.6MB). Now, if compression is on, (bzr+ssh?, but not bzr+http) then the total
data transferred may be more/less depending on compressibility.

My thought was that by the time we stop using .kndx, we'll have switched to
get_graph() or whatever api is possible that doesn't have to read the entire
history.

The other big thing is that local Repository objects have been taught to cache
things like inventory.kndx, so they don't care if you call get_revision_graph()
2 times.

Unless you want RemoteRepository itself to start caching it.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGsJ5sJdeBCYSNAAMRAh92AKCyXa+hCA5F28fHpm6hKgBMLQjHegCgleVE
90vcFVaKm5WilZGsUAbLkLU=
=fYFy
-----END PGP SIGNATURE-----



More information about the bazaar mailing list