[rfc] Progress Bar reworking

Thu Dec 21 17:18:08 GMT 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> I am not convinced that it makes sense to talk to the user in terms of
> phases 

The goal of 'phases' was just to give an overall sense of how far we
have completed. Right now we have a fairly nested set of progress.

As a clear cut example, lets look at 'bzr pull':

1) It has a top-level PhaseProgress with 2 phases, all of fetching, and
then updating the working tree.

2) Fetching has another PhaseProgress with ~6 phases.

  1) Reading inventory.kndx to get the ancestry graph, and comparing it
     with the local graph to find what is missing
  2) Reading inventory.knit to find what file ids have been effected
  3) Copying file texts. Which amounts to reading all of the file.kndx
     and then doing a merge and reading file.knit

  4) Copying the actual inventory.knit data, this is mostly just writing
     what is in memory to disk. Since it should have been downloaded in
     step 2
  5) Copying signatures.knit
  6) Copying revisions.knit

  Note: without my patches phase 2 and 3 are combined, which makes the
  progress *really* unrepresentative because we have no updates during
  step 2

3) Updating the working tree is the second part of pull. This calls
'merge_inner' which for Merge3Merger has another ProgressPhase with 3 phases

  1) Checking every file in the inventory 'Preparing file merge'
  2) Resolving conflicts
  3) Applying the changes

One way to clean this up is to avoid having multiple depths to progress.
Which is what I was advocating in the past, but possibly not with the
same clarity.

Right now we have a ProgressPhase nested inside another inside another,
and down at the bottom we have an actual progress. So all of the higher
level ones are like <10 steps, while the bottom level is >100.

> 
> On 29 Nov 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> 
>> Because of UI issues, we really only ever want 2 levels of progress. The
>> current overall (for the whole command, IMO), and then a nested one for
>> the current step. It might be possible to have more nested ones, but the
>> basic premise is that you really should have a global view of what is
>> going on.
>>
>> As another example, I wrote a script which recursively copies a bunch of
>> branches. There I have an overall "I'm on branch X of Y", but then a
>> narrower "I'm on phase 1/6 for branch 10/20", and then "I'm on revision
>> 20/2000 for phase 1/6 for branch 10/20".
> 
> I would like to avoid ever saying "phase 1/5" to the user and instead
> just say what we're doing.  For example if this phase is about copying
> revisions or building a tree then just say that -- and this effectively
> eliminates one layer.  Also it will make us all aware of what most needs
> performance work, which is probably good.

As mentioned, there are multiple layers. What I switched to was:

| [======              ] 1/6 Finding involved files ( 400/8699)

So the 1/6 is to let you know that there are other steps after this one,
but the actual text is to tell you what we are actually doing. And the
final numbers are to tell you how far you are along with this specific
phase.

It turns out to work quite well for 'bzr branch' because 'branch'
doesn't have that extra top-level 0/2 phase.

Which at the same time is broken, because it *does* have step (3) from
above (at least it has building the working tree).

Which with my patches is a very uninformative progress bar completely
unrelated to the first one.

> 
> I can see that phases may still be useful so that there is one overall
> monotonic progress bar, but they could be used just to calculate
> fractions and not mentioned.
> 
> Having integer phase numbers in the code is a bit distasteful to me.
> 
> I actually think that as long as the message is reasonable and not at
> too low a level, one progress bar may be enough in many cases.
> 
> Also remember that not all communication with the user needs to be
> through progress bars (or at least not through a single bar).  In your
> case of copying multiple branches you could have
> 
>   copying branch apple
>   copying branch orange
>   copying branch pear
>   [123/1231] copying revisions
> 
> I guess it's not quite as pretty but I think it's perfectly ok.
> 

Sure. I'm just trying to make something that looks reasonable, and is
actually giving the user information.

If we want overall progress, I think we should switch to a 2-level
system. Where the overall number of steps is defined by the command, and
so we get a ui.ui_factory.overall_progress() or something like that.

And then underneath that we have the "I'm on file 500/1000".

I think you really want the outer and inner. Because the outer tends to
know "I'm looking for file ids affected by these revisions", while the
inner only knows "I'm returning line 100/1000". But the outer doesn't
really know how far along the work has gone, but the inner doesn't know
the *reason* why it is doing the work.

We *could* solve it by passing along a reason along with a progress bar.
But frequently inner functions are creating the progress bar themselves.

One possible nice refactoring is to get away from having a global 'ui'
state, and instead pass around the current ui. (It also has the effect
of being a little bit nicer to avoid library state for multithreading, etc).

But then all of your functions have a 'ui=X' wart on them. Though we
already have a less useful 'pb=X' wart on a lot of functions.

The reason I was thinking to put the overall progress at the Command
level, is because Fetch certainly should have no need to know that
'build_tree' is going to happen afterwards. Especially because sometimes
it is 'build_tree' with 2 phases, and sometimes it is 'merge' which has
3 phases. (See the difference between 'bzr branch' and 'bzr pull')

The other problem, though, is sometimes the number of phases is
dependent on the data type. Fetching, for example, has 5 phases for
Weave, and 6 phases for Knit. (At least after my patch to properly split
out the knit phases).

And you don't *really* want the top level command to have to know the
details of everything it is going to call.

But I don't really see a way around that if you really want to have a
feeling for how long the overall request is going to take. At some point
you need something that knows what "overall" really means.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFisHQJdeBCYSNAAMRAjf/AJ95FkvqRr3LSNmUjzxHpP13D/8zPACeK8rO
7RhW+MTObanMzPJIbzJdNVw=
=ZedX
-----END PGP SIGNATURE-----