bzr/LP issues from work discussed at UDS

John Arbash Meinel john.meinel at
Thu Dec 3 18:11:30 GMT 2009


>> http://host/path/to/X;branch=Y
>> As the preferred syntax. It requires quoting on the command line, and
>> using 'branch=' is a bit more verbose if you were typing it manually.
>> But it ends up being the "least evil". So I'm guessing we should JFDI
>> and get something.
> Sounds good to me.
> I'm sure the question has been asked, but does git have a syntax for
> doing this?

I'm pretty sure that 'git clone' copies all branches in the repo.
Looking at the syntax for that command, it seems to have a "--branch X"
flag. Which would hint that you can't supply the branch as part of the URL.
Looking at:

Also doesn't give any hint as to a URL one could use to specify a single

I'm certainly not an expert, but the best I can come up with is, no they
do not have a syntax for this.

>> I think we currently have 8k or so, with some fraction of that failing.
>> At least, I thought I remembered about 1-2k failures, and a 25% failure
>> rate. So 2/.25 = 8k.
> Thanks. That means we are looking at about doubling the number of imports
> this cycle.
>> I think you need to have some Launchpad interfaces, so that people can
>> garden their own branches. Gardening needs to be done from time to time.
>> If we aren't going to do it ourselves, then we need to expose a way for
>> others to do it.
> I agree. In this case however, we are taking the locations from Debian
> metadata, so we can at least semi-automatically do it.

Well, provided the people are properly maintaining that information. :)

>>> 5) API for requesting a code import be tried ASAP
>>> Do Branch.requestMirror() and Branch.last_mirror_attempt refer to
>>> importing to the code if the branch is a vcs-imports one?
>>> If not, can we get an API similar to the above for vcs-imports?
>>> We would want to say “try now,” and then spend a while waiting
>>> for an indication it tried to import, so that we could be reasonably
>>> sure the import was up to date.
>> It sounds like you want a synchronous api, but probably something like:
>> startMirror() # return when the attempt has started, or failed
>> waitForFinish() # wait until the current mirroring has finished, include
>>         # info about how much has been imported.
> I think synchronous is impossible in the LP API currently, if for the simple
> reason that the request will time out after a short time, likely to be
> longer than a mirror.
> The two things I highlighted at least allow us to approximate this with
> polling. I am told that polling is probably the best we can do for
> at least the medium term with the LP API.

Well, there are also callbacks, etc. The point is that your process
thinks of it synchronously. Even if it is abstracting away the internals.

>> Though I have to ask, how important is it to be at the current tip? What
>> do you want to do if the tip is 'active' and there is more that can be
>> pulled as soon as you finish the previous pull? Are you going to loop
>> until convergence?
>> If you aren't waiting for convergence, is there harm in having an import
>> be < 24 hours out of date?
> Consider this:
>   * Debian maintainer upgrades to a new upstream version in their VCS.
>   * They test and upload the package.
>   * They then commit/push as appropriate for their VCS.
>   * We see the upload on average 3 hours later.
>   * The probability of the import running in those 3 hours is small.
>   * Therefore we won't be able to see the revision corresponding to
>     the upload and so can't add it as a parent.
> Therefore when we see the upload, I would like to trigger the code import
> system to make a best effort to be up to date at that point. It's not
> perfect, but it will cover the common case. (If the maintainer forgets
> to push then the time until the revision can be mirrored may be
> unbounded.)

So we are watching the uploads to the debian repo, and then trying to
replicate that in the Ubuntu data?

As far as "if the maintainer forgets to push", I think that is going to
be a significant fraction of the time. I've had it happen several times
with Jelmer and bzr-svn, and he's pretty savvy. As for "unbounded", I
think on average it is "by the time I poke him", or "the next chance is
the next time he packages an update".

>>> 6) Guessing parent relationships
>>> We currently infer parent relationships from debian/changelog, as
>>> if you include changelog entries of another upload then we presume
>>> you merged the changes.
>> What about the imports that are from upstream (and presumably don't have
>> a debian/ directory at all)?
> That is out of scope for this phase. We will have to solve this at some
> point, as possibly soon for daily builds as you suggest above.
>>> We will need to start inferring parent relationships in some cases
>>> though, as there are some uses that means the code that was uploaded
>>> is never exactly committed as a single revision. (Such as never
>>> commiting the revision that changes the target from UNRELEASED
>>> to unstable, or files modified in the clean target.)
>>> The heuristics shouldn't have to be too fuzzy, but any fuzziness
>>> makes me a little nervous, do the bzr developers agree? Do you
>>> have any advice on how to do it well, so that it doesn't cause
>>> mis-merges and the like?
>> Merging content is generally a bit fuzzy anyway. Which is one reason why
>> we don't auto-commit it...
>> I don't have great answers here, but I'm guessing we'll have to be
>> satisfied with a 75% solution, because you really can't do much better
>> than that.
> That's how I feel too. I will code the heuristics conservatively, but
> there will be times when a mistake is made.
>>> 7) Migration over branch history rewrites
>>> In order to include new history in to the branch we need to
>>> rewrite their history. This means changing revision ids.
>>> In order to make the new branches mergeable with existing other
>>> branches we need to change file ids.
>>> We can do this fine for all the branches we control, but it
>>> will instantly make developers local branches unrelated.
>> You can always mark them as merges rather than throwing away that
>> history. But then you have to carry around all the extra history...
> That's true.
>> Well, you can store the maps in the revision graph, or you could weaken
>> it and just store it as a revision property (which is mostly what
>> bzr-rewrite and bzr-svn/git/hg are doing today, IIRC)
> Would putting the file-id maps there make sense too?

It sounds like a lot of data to carry around. And I wonder why you need
them specifically outside of just looking at what the file-ids are in
the target repository. I don't know the internals of bzr-git, though.

>>> Also, we have a terrible user experience on the flag day:
>>>   # day before
>>>   $ bzr pull
>>>   ...
>>>   # flag day
>>>   $ bzr pull
>>>   bzr: ERROR: there is no common ancestor.
>>> any suggestions on how to improve on that would be gratefully
>>> received.
>> If you put it in the rev graph, then pull 'just works', but if you are
>> changing file-ids, then they get 2x the history, their repo gets big,
>> and we touch every file on their filesystem.
>> Though if you don't include the revision graph, then pull fails, they
>> have to start a new fetch, and their repo doubles in size (or they just
>> start a new repo, but still...)
>> You could always have "bzr flag-day-pull" or some sort of command that
>> knows that the ancestry is being re-written, and to pull across based on
>> that.
> That's the kind of thing that I want, but nothing in the above example
> gives any clue that this command exists or that they should run it now.
> That's my concern.
> Thanks,
> James

Well, having a plugin that provides "flag-day-pull" could have it
decorate the "pull" command and either:

1) Do it directly
2) Decorate the errors so that it can give hints that you may want to
run a different command.

Note that if you go the revision-property route, you have something like:

Revision: new-rev-id
Converted-from: old-rev-id

And then 'pull' could:

1) Try to pull normally
2) Trap DivergedBranches exception, and probably NoCommonAncestor
3) Grab
r = remote_branch.repository.get_revision(remote_branch.last_revision())
if 'converted-from' in
  old_rev =['converted-from'].encode('utf-8')

I think the main problem is that doing this 'correctly' means inspecting
the whole history. Consider:
 bzr rewrite-all-revs
 bzr commit
 bzr commit

Now the tip revision is *not* a conversion, but you have to look back a
few revs to find one that was, and then merge/pull appropriately. I
don't think this is particularly hard from a coding perspective, but it
is fairly time consuming from a "inspect all the remote ancestry"
perspective. So I would think you don't want to have it active on every


More information about the ubuntu-distributed-devel mailing list