Introduction to history deltas

Jan Hudec bulb at ucw.cz
Wed Dec 7 15:05:40 GMT 2005


On Wed, Dec 07, 2005 at 13:08:06 +0100, David Allouche wrote:
> On Wed, 2005-12-07 at 17:14 +1100, Martin Pool wrote:
> > Operating on many small files is slower than getting all the data you
> > want from one large files[*].
> [...]
> > [*] I did a mini benchmark of writing one line to N files vs writing N
> > lines to one file; for small N the speed is comparable but for large N
> > it can be a hundred times slower.  This may be because opening each file
> > uses up a fixed amount of memory in both the kernel and in Python -- for
> > example we are using 4kB of disk cache for each line.
> 
> I am happy you bring that issue forward, I wanted to make a comment
> since I began reading that thread.
> 
> I expect that grouping hunks by commits would make building the weave
> for a text-revision marginally more expensive in the hot disk cache case
> (because of open overhead) and very significantly more expensive in the
> cold disk cache case, because it defeats read-ahead in the kernel.
> 
> Be nice to your system caches, they will pay you back for it.

Right. For another case in pull with knits you usually want tail of each one
(because the chunks are ordered by time of addition and you want everything
since last pull), which can be read in a single request over the network and
locally it's as sequential as we can get.

So in the end grouping by file seems better for most read operations. It
complicates write somewhat (because there is atomic replace, but no atomic
append) but read is far more common operation, so it should be the one
optimized for.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051207/e3e2b0eb/attachment.pgp 


More information about the bazaar mailing list