[MERGE] repository docs (cleaner still) and GraphIndex

Sat Jul 14 06:12:22 BST 2007

On Fri, 2007-07-13 at 19:35 -0400, Aaron Bentley wrote:

> >> In fact, it seems like it would be trivial to convert this to use the
> >> container format.
> > 
> > Possibly. I could not see an advantage to using the container format for
> > this because:
> >  - the data is homogenous
> >  - the names field is useless for an index
> 
> Can't you store the key as the name?

Depending on how its structured - maybe. Composite keys like I'm
proposing in a separate thread would have many repeated byte references
across different records. (and 1111111 is much more compact than robertc
@ ........................). Certainly the node reference data won't
fit; so I'd need to still parse and manage that within the data block of
the container; and I suspect I'd only ever use the bytes field because
I'm strongly considering adding direct graph iteration to the GraphIndex
API.

> >  - theres no need to address each index record by anything other than
> > bytes.
> 
> Well, from this implementation, it didn't look like you were trying to
> do that.

EPARSE. Too many negatives. Let me rephrase. Within the toy index
itself, the delta compression means we can talk most efficiently about
other records by their starting location.

> >>   Instead of byte offsets, you'd use record numbers,
> >> which can be precalculated.
> > 
> > Uhhh, I don't see that this is true unless you mean that you'd write the
> > record number in a header in the pack record; then look for that. If
> > thats what you mean, see above about sync-up of the record stream.
> 
> Well, it looked like you were going to read the whole file anyhow.
> 
> It's just that we've got so many file formats now, I'd hope we can use
> existing formats more and more.

I agree. I did consider it, but it seemed to be unreasonable to the pack
layer overall to me.

> > references are always meaningful because they are returned to the caller
> > as keys, not as the internal byte offsets.
> 
> Okay.  Missed that one.

I'll make the docstring clearer.

> > 
> >>> +Revert              Revision graph access, Inventory extraction, file text
> >>> +                    access.
> >> What's the revision graph access for?  Revno to revision id translation?
> >> Is inventory extraction necessary?  With fast comparison of historical
> >> trees, we should not need it.
> > 
> > I thought revert was based around 'change to be like revision X', not a
> > delta ? What should Revert state here?
> 
> You're correct, but the implementation uses iter_changes.  We should be
> able to implement iter_changes against any arbitrary historical tree by
> comparing the working tree to the basis, and then comparing the
> historical tree to the basis.  I'll do anything I can to avoid O(tree)
> operations.

So this sounds like you want direct iter-changes against the selected
revision?
Revert         wt-history-inventory-delta, arbitrary file text access 

> >>> +Ideally we can make our data access for commands such as branch to
> >>> +dovetail well with the native storage in the repository, in the common
> >>> +case. Doing this may require the commands to operate in predictable
> >>> +manners.
> >> Can you say more?
> > 
> > If we have two commands which could use the same access pattern, or
> > could use a different one, we may get a win by coercing them to the same
> > access pattern but optimising it heavily. Its a little speculative, thus
> > the 'may' in my prose.
> 
> That's much clearer.  Thanks.  Could you tweak the doc to talk about
> common access patterns instead of "predictable manners"?

sure.

> >>> +Stream all data required to recreate revs   branch (lightweight)
> >>> +Stream file texts in topological order      bundle
> >> ^^^ how is this different from branching?
> > 
> > branching does not require topological order - generating mp_diffs does.
> 
> I was more confused about why "bundle" was "file texts" and "branch" was
> "all data required to recreate revs".

I think it depends on what layer we end up pushing the mpdiff stuff too.
E.g. if the repository satisfies 'smallest diff possible' requests, then
it will be 'stream smallest possible data to recreate file texts', if
the repository satisfies full text requests only, then you need the
topological access to build up your mp diffs.

Do you want another pass (it'll take me a bit to cleanup based on your
review; I'll probably do that on the plane) before +1, or is it in good
enough shape to merge do you think?

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070714/f8b78f5d/attachment.pgp