[RFC] Removing hash prefix in storage vastly improves performance

John Arbash Meinel john at arbash-meinel.com
Fri Aug 18 15:31:34 BST 2006


Matthew D. Fuller wrote:
> On Fri, Aug 18, 2006 at 09:51:49AM +1000 I heard the voice of
> Robert Collins, and lo! it spake thus:
>> What are the performance changes on kernel sized trees for
>>  - BSD (perhaps)
> 
> I can't speak for trying it (my systems would revolt against me), but
> here's some stats size-wise for FreeBSD.
> 
> Fresh "cvs export -rHEAD":
> 
> src (kernel + userland):
>     % find src -type d -print | wc -l
>         3286
>     % find src -type f -print | wc -l
>        22600
> 
> ports tree:
>     % find ports -type d -print | wc -l
>        22969
>     % find ports -type f -print | wc -l
>        84257
> (obviously, this is the far side of the curve from "put everything in
> one directory   ;)

Well, there is also the fact that we create 2 files (index + knit) for
every versioned file. So all these numbers double. And we create these
files for directories as well, so you would end up with:
(22969 + 84257) * 2 = 214,452 or >200k files in one directory. Which is
a little bit much for any filesystem without an index.

With a perfect distribution over the 256 hash prefixes, you only have 1K
files per dir, which is probably a lot more reasonable.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060818/9cc7a8cc/attachment.pgp 


More information about the bazaar mailing list