bzr/LP issues from work discussed at UDS

Thu Dec 3 18:11:30 GMT 2009

...

>> http://host/path/to/X;branch=Y
>>
>> As the preferred syntax. It requires quoting on the command line, and
>> using 'branch=' is a bit more verbose if you were typing it manually.
>> But it ends up being the "least evil". So I'm guessing we should JFDI
>> and get something.
> 
> Sounds good to me.
> 
> I'm sure the question has been asked, but does git have a syntax for
> doing this?

I'm pretty sure that 'git clone' copies all branches in the repo.
Looking at the syntax for that command, it seems to have a "--branch X"
flag. Which would hint that you can't supply the branch as part of the URL.
Looking at:
http://www.kernel.org/pub/software/scm/git/docs/git-clone.html#_git_urls_a_id_urls_a

Also doesn't give any hint as to a URL one could use to specify a single
branch.

I'm certainly not an expert, but the best I can come up with is, no they
do not have a syntax for this.

> 
>> I think we currently have 8k or so, with some fraction of that failing.
>> At least, I thought I remembered about 1-2k failures, and a 25% failure
>> rate. So 2/.25 = 8k.
> 
> Thanks. That means we are looking at about doubling the number of imports
> this cycle.
> 
>> I think you need to have some Launchpad interfaces, so that people can
>> garden their own branches. Gardening needs to be done from time to time.
>> If we aren't going to do it ourselves, then we need to expose a way for
>> others to do it.
> 
> I agree. In this case however, we are taking the locations from Debian
> metadata, so we can at least semi-automatically do it.

Well, provided the people are properly maintaining that information. :)

> 
>>> 5) API for requesting a code import be tried ASAP
>>>
>>> Do Branch.requestMirror() and Branch.last_mirror_attempt refer to
>>> importing to the code if the branch is a vcs-imports one?
>>>
>>> If not, can we get an API similar to the above for vcs-imports?
>>>
>>> We would want to say “try now,” and then spend a while waiting
>>> for an indication it tried to import, so that we could be reasonably
>>> sure the import was up to date.
>> It sounds like you want a synchronous api, but probably something like:
>>
>> startMirror() # return when the attempt has started, or failed
>> waitForFinish() # wait until the current mirroring has finished, include
>>         # info about how much has been imported.
> 
> I think synchronous is impossible in the LP API currently, if for the simple
> reason that the request will time out after a short time, likely to be
> longer than a mirror.
> 
> The two things I highlighted at least allow us to approximate this with
> polling. I am told that polling is probably the best we can do for
> at least the medium term with the LP API.
> 

Well, there are also callbacks, etc. The point is that your process
thinks of it synchronously. Even if it is abstracting away the internals.

>> Though I have to ask, how important is it to be at the current tip? What
>> do you want to do if the tip is 'active' and there is more that can be
>> pulled as soon as you finish the previous pull? Are you going to loop
>> until convergence?
>>
>> If you aren't waiting for convergence, is there harm in having an import
>> be < 24 hours out of date?
> 
> Consider this:
> 
>   * Debian maintainer upgrades to a new upstream version in their VCS.
>   * They test and upload the package.
>   * They then commit/push as appropriate for their VCS.
>   * We see the upload on average 3 hours later.
>   * The probability of the import running in those 3 hours is small.
>   * Therefore we won't be able to see the revision corresponding to
>     the upload and so can't add it as a parent.
> 
> Therefore when we see the upload, I would like to trigger the code import
> system to make a best effort to be up to date at that point. It's not
> perfect, but it will cover the common case. (If the maintainer forgets
> to push then the time until the revision can be mirrored may be
> unbounded.)

So we are watching the uploads to the debian repo, and then trying to
replicate that in the Ubuntu data?

As far as "if the maintainer forgets to push", I think that is going to
be a significant fraction of the time. I've had it happen several times
with Jelmer and bzr-svn, and he's pretty savvy. As for "unbounded", I
think on average it is "by the time I poke him", or "the next chance is
the next time he packages an update".

> 
>>> 6) Guessing parent relationships
>>>
>>> We currently infer parent relationships from debian/changelog, as
>>> if you include changelog entries of another upload then we presume
>>> you merged the changes.
>> What about the imports that are from upstream (and presumably don't have
>> a debian/ directory at all)?
> 
> That is out of scope for this phase. We will have to solve this at some
> point, as possibly soon for daily builds as you suggest above.
> 
>>> We will need to start inferring parent relationships in some cases
>>> though, as there are some uses that means the code that was uploaded
>>> is never exactly committed as a single revision. (Such as never
>>> commiting the revision that changes the target from UNRELEASED
>>> to unstable, or files modified in the clean target.)
>>>
>>> The heuristics shouldn't have to be too fuzzy, but any fuzziness
>>> makes me a little nervous, do the bzr developers agree? Do you
>>> have any advice on how to do it well, so that it doesn't cause
>>> mis-merges and the like?
>> Merging content is generally a bit fuzzy anyway. Which is one reason why
>> we don't auto-commit it...
>>
>> I don't have great answers here, but I'm guessing we'll have to be
>> satisfied with a 75% solution, because you really can't do much better
>> than that.
> 
> That's how I feel too. I will code the heuristics conservatively, but
> there will be times when a mistake is made.
> 
>>> 7) Migration over branch history rewrites
>>>
>>> In order to include new history in to the branch we need to
>>> rewrite their history. This means changing revision ids.
>>>
>>> In order to make the new branches mergeable with existing other
>>> branches we need to change file ids.
>>>
>>> We can do this fine for all the branches we control, but it
>>> will instantly make developers local branches unrelated.
>> You can always mark them as merges rather than throwing away that
>> history. But then you have to carry around all the extra history...
> 
> That's true.
> 
>> Well, you can store the maps in the revision graph, or you could weaken
>> it and just store it as a revision property (which is mostly what
>> bzr-rewrite and bzr-svn/git/hg are doing today, IIRC)
> 
> Would putting the file-id maps there make sense too?
> 

It sounds like a lot of data to carry around. And I wonder why you need
them specifically outside of just looking at what the file-ids are in
the target repository. I don't know the internals of bzr-git, though.

>>> Also, we have a terrible user experience on the flag day:
>>>
>>>   # day before
>>>   $ bzr pull
>>>   ...
>>>   # flag day
>>>   $ bzr pull
>>>   bzr: ERROR: there is no common ancestor.
>>>
>>> any suggestions on how to improve on that would be gratefully
>>> received.
> 
>> If you put it in the rev graph, then pull 'just works', but if you are
>> changing file-ids, then they get 2x the history, their repo gets big,
>> and we touch every file on their filesystem.
>>
>> Though if you don't include the revision graph, then pull fails, they
>> have to start a new fetch, and their repo doubles in size (or they just
>> start a new repo, but still...)
>>
>> You could always have "bzr flag-day-pull" or some sort of command that
>> knows that the ancestry is being re-written, and to pull across based on
>> that.
> 
> That's the kind of thing that I want, but nothing in the above example
> gives any clue that this command exists or that they should run it now.
> That's my concern.
> 
> Thanks,
> 
> James
> 

Well, having a plugin that provides "flag-day-pull" could have it
decorate the "pull" command and either:

1) Do it directly
2) Decorate the errors so that it can give hints that you may want to
run a different command.

Note that if you go the revision-property route, you have something like:

Revision: new-rev-id
Converted-from: old-rev-id

And then 'pull' could:

1) Try to pull normally
2) Trap DivergedBranches exception, and probably NoCommonAncestor
3) Grab
r = remote_branch.repository.get_revision(remote_branch.last_revision())
if 'converted-from' in r.properties:
  old_rev = r.properties['converted-from'].encode('utf-8')
  ...

I think the main problem is that doing this 'correctly' means inspecting
the whole history. Consider:
 bzr rewrite-all-revs
 bzr commit
 bzr commit

Now the tip revision is *not* a conversion, but you have to look back a
few revs to find one that was, and then merge/pull appropriately. I
don't think this is particularly hard from a coding perspective, but it
is fairly time consuming from a "inspect all the remote ancestry"
perspective. So I would think you don't want to have it active on every
pull.

John
=:->