please check out weave-format branch
Martin Pool
martinpool at gmail.com
Fri Sep 23 22:30:31 BST 2005
On 24/09/05, John A Meinel <john at arbash-meinel.com> wrote:
> Another small point as I get deeper into the code.
> Why do you write out the revision xml as uncompressed?
>
> I guess in my testing, you don't save a lot of space, since the files
> are small. --apparent-size is 660k vs 755k, but because of filesystem
> blocks, du -ksh reports 6.3M versus 6.2M.
For just this reason - the actual saving is small, and it seems like
python gzip is actually somewhat expensive to run. It's not a final
decision, and perhaps even if it's less effective on disk it'd be
better to have it compressed to help with http.
(Actually I overestimated the cost of gzip because my upgrade cost was
doing some redundant work, so perhaps it doesn't matter so much.)
If merging back to your code meant they had to be compressed I
wouldn't really mind.
> But the above du does raise an interesting issue. That we are losing
> about 10x disk space because of a bunch of very small files. It isn't a
> lot of space, and I don't know if people will really care, but I thought
> I would mention it.
Yes, it is quite noticeable. On the other hand it's only an overhead
of 2k per revision, which looked at that way is not so bad.
You can imagine designing an append-only file that stores them more
compactly but allows fast random access, but perhaps its the
filesystems job.
If we do keep them as separate files it might be good to eventually
allow for hashed subdirectories to accomodate filesystems that can't
handle having many files in a directory.
--
Martin
More information about the bazaar
mailing list