Container repository data ordering

Thu Jul 19 15:35:43 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Wed, 2007-07-18 at 10:06 -0400, Aaron Bentley wrote:
>> This sequence should work well for tree-wide operations like "merge",
>> "revert", "status", "diff" and "checkout".  It won't provide as much
>> locality-of-reference as the current format does for single-file
>> commands like "cat" and "annotate".  I think that's inevitable.
> 
> Hmmm, I think it should have no worse locality of reference, and
> possibly better than the current formats, once you consider disk block
> allocation.

I don't think that makes sense.  Say you have a one-byte file, and 20
DVD images in your current revision.  To cat the one-byte file, you'll
have to seek or read past 10 DVD images on average.

>> What are our requirements?
>> 1. Building recent versions should be fast
>> 2. We should not hold more than one fulltext in memory at a time
> 
> This is for tree-building?

No, this is to move in the direction of "stop requiring full memory
copies of files" from your "planned-performance-changes" document.  If
the text-extraction routines can't guarantee they're not holding more
than one fulltext in memory, there's no way we can achieve that.

To actually achieve "stop requiring full memory copies", the fulltext
would need to come after the deltas, whatever the build order, so that
the deltas could be applied to the fulltext while streaming it in.

> For merge I presume we want to get 2-3
> fulltexts out at once?

We may.  If we're serious about not holding fulltexts in memory, perhaps
we would dump them to disk.  I don't know.  It seems like a tough goal
to achieve.  And versioning .isos doesn't seem like it should be a big
concern of ours.  I would sacrifice better .iso handling for better
kernel-tree handling in a heartbeat.

> I think requirement 2 above could be explained a bit more; it doesn't
> seem obvious to me that that is a requirement that is a problem.

I'm just trying to comply with the roadmap.

>> However, the desire to avoid holding more than one fulltext in memory
>> means that once we have read a fulltext, we cannot have another fulltext
>> until all the relevant patches have been read.  So if the fulltext comes
>> first, this suggests that the relevant patches must come immediately
>> afterward.
> 
> Agreed.

I think there's an error in my logic, though.  Even if the we
backwards-patch, we are not obligated to put the fulltext before the
delta chain.  If we put it after the delta chain, then we can read all
relevant deltas into memory before we hit the fulltext.

If we have the relevant deltas in memory before we encounter the
fulltext, we can apply the delta to the fulltext in a streaming fashion.
 So this layer at least could meet the goal of "stop requiring full
memory copies".

>> This means that while reading recent revisions will have low CPU
>> requirements, it will have higher IO requirements, because it can only
>> stop reading the container once it has read (or seeked past) all the
>> fulltexts it needs, and (almost) all the deltas which are based on them.
> 
> I don't see how this increased IO load follows. Perhaps an example will
> help?
> given revisions A:[], B:[A], C:[B], D:[A], E:[D,C]
> In my head we have something like:
> record | Fileid | revision | compression-basis-list | DATA |
> 0      |   X    |   E      | []                     | ...  |
> 1      |   X    |   D      | [E]                    | ...  |
> 2      |   X    |   C      | [E]                    | ...  |
> 3      |   X    |   B      | [C]                    | ...  |
> 4      |   X    |   A      | [D]                    | ...  |
> 5      |   Y    |   E      | []                     | ...  |
> 6      |   Y    |   D      | [E]                    | ...  |
> ...
> 
> That is, to build the tree E, you read record 0 and record 5 to get the
> texts of file X and Y.
> 
> In terms of disk cache and head activity this is paging in 2 4K pages
> (or maybe only one 4K page).

I don't understand how you're getting any information about size here.
How do you know that record 3 is not 300GB long?

> To build the tree D, you read records 0 and 1, then 5 and 6, and so on.
> 
> On average, to access an arbitrary text in a file, whether you go
> new-old or old-new you page in 1 fulltext and 1/2 the delta chain length
> worth of patches.

Yes, but in your example, you have to read or seek past record 4 in
order to start reading records for Y.  If you're not interested in X,
then this is wasted effort.

There are many cases where we are interested in only a subset of files.
 But we still have to seek to get at the data we want, or for http, we
may simply have to read until we get it.

The issue is that http doesn't let us seek in mid-stream.  So once we've
started reading the container, we can't seek.  But we don't know what
records we want until we've processed inventory.  So we don't know what
records we want to skip until we've already started reading the container.

>>  Reducing the size of the delta chain will reduce the amount of data
>> that needs to be read for recent revisions, at the cost of increasing
>> the amount of data that needs to be read for older revisions.
> 
> Reducing the delta chain will *increase* the amount of data to be read
> to reconstruct recent revisions, AIUI.

It will be reduced for those revisions contained in the reduced delta
chain, because we'll reach the end of the delta chain sooner and can
start more productively reading the next set of file data.

>> By contrast, a forwards-patching format would allow deltas to be written
>> for new versions, minimizing the number of fulltexts written.
> 
> My proposed format above allows both forward and backwards patching.

Supporting only a single direction is more simpler and saves space.  Why
do you want this?

>>   Some
>> containers could contain only deltas, therefore reducing the need for
>> repository compaction.
> 
> Containers with only deltas drive repository compaction because they
> drive disk seeks, which are expensive.

If delta chains are kept together with the fulltexts, then the
compaction you describe will increase the distance between fulltexts of
different files, making overall access speed slower.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGn3a/0F+nu1YWqI0RAut3AJ9UpN81I37fHWZERdwHRT7++BgrOACfUwjZ
WftWx5qbtWaqEggw5cdmVlc=
=ffYF
-----END PGP SIGNATURE-----