[RFC] Removing the inventory concept from Bazaar.
Matthew D. Fuller
fullermd at over-yonder.net
Thu May 10 08:43:30 BST 2007
On Thu, May 10, 2007 at 08:48:30AM +0200 I heard the voice of
Dennis Benzinger, and lo! it spake thus:
>
> But compared to the size of a hundred thousand revisions of a
> hundred thousand files one gigabyte of noop records is probably not
> that much. And storage is cheap these days, time is not. So if using
> the noop records speeds things up I think the waste of space is
> acceptable.
Well, as we corrected me, it's more tera than giga, which kinda pulls
the rug out from under this :). But even let's assume it's a
gigabyte.
Mass storage may in some cases be cheap, but it's not always. And
even where it is, its accoutrements like backup space and I/O
bandwidth aren't. Network bandwidth certainly isn't cheap enough that
I want to be throwing around gigabytes I don't have to.
And it's hardly foreordained that it's miniscule next to the size of
the revisions. I've got a CVS repo here with over 150k revisions and
nearly a hundred thousand files (more if you count files that don't
exist in the head rev). The repo is 1.1 gig. bzr saves some space by
compressing stuff, and loses some by storing more fulltexts than RCS
files do. With to-come optimizations (including, amusingly, the
inventory rework/elimination that began this thread ;), we can assume
we'll store it in 2 gig or less. A gig of noops would mean that a
third of the total size is saying "Nothing happened". In big deep
trees like that, inventories are a problem because the knits get huge,
but that means of eliminating them doesn't make the problem much
better.
The every-commit-touches-every-file side effect would absolutely
slaughter you performance-wise on a tree that size. Degenerately,
consider I made a 1-line change and push; if we need 2 round trips for
each of a hundred thousand files changed, and each round trip is
100ms, that's 200,000 round trips of .1s, for 20,000 seconds total,
which is almost 14 hours. Even doing the commit, if we assume it
takes 3ms a file, that's 5 minutes of slamming the disk to write out
all the noops.
(This time, I didn't double-check any of my math, so it's assured
correct :-)
--
Matthew Fuller (MF4839) | fullermd at over-yonder.net
Systems/Network Administrator | http://www.over-yonder.net/~fullermd/
On the Internet, nobody can hear you scream.
More information about the bazaar
mailing list