[MERGE] Port across errors for shallow branch support.

Thu Feb 21 19:50:45 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
| On Wed, 2008-02-20 at 00:05 -0500, Aaron Bentley wrote:
|> -----BEGIN PGP SIGNED MESSAGE-----
|> Hash: SHA1
|>
|> Robert Collins wrote:
|>> Ah. Well I've come to the conclusion that branch stacking is the most
|>> scalable because all the other options seem likely to access many
|>> spurious locations far too often.
|> I don't feel good about reviewing errors for branch stacking when I
|> think branch stacking doesn't make sense.
|>
|> If remote stacking locations are stored on a per-branch basis, then you
|> can easily wind up in situations where a branch can't access the
|> information it needs to construct the revision it wants.
|
| Can you enlarge on this; I don't see how it can happen more or less
| easily than with external references being stored in the repositories
| control dir.
|
|> Your post doesn't really clarify your reasoning to me, so I still think
|> branch stacking is a bad idea.
|
| First of all a lemma: Both approaches provide the same set of all
| external references. The reason: all branches of a repository are within
| the repository. Repositories have a find_branches method to find the
| branches using the repository. If each branch has an external location
| pointer, then in terms of getting access to all data:
|>>> set([branch.get_stacked_location() for branch in r.find_branches()])
| should return the same data (for a branch-orientated implementation) as
|>>> r.get_stacked_locations()
| for a repository-orientated implementation.
|
| Now, consider a repository with a number of branches, 5 or 6 of which
| have external references.
|
| In a repository centric implementation, all those external references
| will be active always. This means that even entirely local operations
| will access /all/ those external references whenever any missed key
| lookup occurs. It forces repository-wide scaling on branch-wide
| operations, which is fundamentally bad.
|
| Only repository wide operations should display repository-wide scaling.
|
| -Rob
|

I just wanted to mention 1 thing that occurred to me in all of this discussion.

When I first envisioned shallow branches, I thought of them as a form of
standalone branch. Such that they wouldn't share a repository. I think the
implementation would be easier if we did it that way, but I won't say that it
must be done in that fashion.

I think Robert is mixing a little bit of "what must happen" with "what would
happen in the way he would implement it".

I can understand his point that if you don't provide hints as to which remote
repositories to use, then just opening the local one (which references 10
external ones) requires you to open 10 repositories. With the branch giving the
hint, then you would only open 1.

You could mark the external references by what revisions you expect to be there.
And then when searching the ancestry graph you would only open them if you were
missing the revisions that were referenced.
However, this starts doing weird things if you start pulling some of those
revisions locally.

~  For example, in an initial shallow branch, you reference revision-foo and mark
the repository such that 'revision-foo can be found at http://blah'. Now when
you are going back in history, when you see "parents = ['revision-foo']" then
you connect to "http://blah" and grab it. So far so good.
However, if you ever get "revision-foo" into the local repository, then you need
to know what *other* revisions might be present at the remote site. Either that
or you have to have a lookaside for "is there any referenced repo that provides
this revision that I should be connecting to?".

I can see a nice property with branch scaling, in that doing "branch.repository"
can return a proxy that can connect to referenced repositories. However you have
some issues that you really need to reference at the branch level so that the
references can be chained. So now "branch.repository" is saying open the other
repository *at this branch* in case that branch also wants to reference another
repository.

And I have to say, going back to "shallow branches are a form of standalone
branch" seems to simplify a lot of this. Then the shallow repository can
reference another repository (shallow/shared/etc doesn't matter). Which can
chain off to as many others as it wants.

The downside, of course, is now you lose the sharing between branches. And if
you have lots of long-lived initially shallow branches, that could start to grow
rather large.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHvdYVJdeBCYSNAAMRAo2kAKDF2zX/tnj2wC76TcxuiZRx7o2RPACeNNK0
W1rgHNR1Ll6x8txE386vLTc=
=9VnQ
-----END PGP SIGNATURE-----