data insertion and reads. with packs

Thu Aug 2 06:49:39 BST 2007

I mentioned earlier that we have a disconnect between the streaming of
containers and the transport interface.

Where this starts to matter is in mapped knits. Basically we currently
support reading from a knit data that has been added to it, with no
expectation of a 'finalisation' or other step. But without a good
incremental-write facility on transport we can't really offer that well
- to read data from a container being written we will damage the file
pointer during reads, or we get into requiring os level features like
'dup2', which is less portable. The current 'append' interface while it
should work is not up to the performance constraints we have - pulling
15000 objects into a pack will result in 15K append calls - which is
roughly 45K/60K syscalls, some of which are not so cheap on some
platforms.

Spiv and I were chatting and one possibility came up, which is that we
probably *don't need* to read data we are inserting arbitrarily. So if
we structure our operations thusly:

 * insert
 * finish the insertion - output indices, etc
 * validation if needed (e.g. for fetch, check sha1s of texts now (if it
wasn't possible during insertion))
 * commit the new data (commit_write_group)

Then we avoid this issue completely, and it will probably help by
getting us to think more carefully about arbitrary re-reading of data.

Thoughts?
-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070802/655f7ac3/attachment.pgp