[MERGE] show_log with get_revisions

Mon Jun 12 18:23:28 BST 2006

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> Aaron Bentley wrote:
>>>> I have implemented get_revisions for all RevisionStore objects.  I think
>>>> it's simpler to have one codepath where possible, and get_revision is
>>>> easily implemented in terms of get_revisions.
>>>
>>> Other than breaking the api, I'm fine with this.
> 
> I don't think this breaks the API.  I haven't removed any methods.
> 
>>> I would guess that at a
>>> minimum we should deprecate get_revision(), and have it thunk over to
>>> get_revisions()
> 
> get_revision is very useful at the top layer.  Perhaps
> RevisionStore.get_revision() could be deprecated, though.
> 

The way I read the diff you changed 'get_revision' to become
'get_revisions'. But maybe that was just diff playing tricks on me.

...

>>> But my feeling is that the higher level is going to know, "I want these
>>> revisions, and in this order", while the lower level is going to know
>>> "reading them in this order is the fastest".
>>>
>>> Since 'bzr log' thinks it is going to request all the revisions, but
>>> might actually never get to all of them 'bzr log | less; review, ^C'.
> 
> This is the part where we disagree.  I think bzr log should know "First,
> I want these 5 revisions.  Then the next 50 revisions.  Then the next
> 500 revisions."  That information can be encapsulated in a generator,
> defined in show_log.  The reason is that only bzr log can know how many
> revisions are needed to fill a screen (this varies according to log
> formatter and screen size).

I kind of agree with you. But at the same time, if you give the lower
levels more flexibility, they can help you a lot more than if you force
them to read exactly the way you request.

> 
>>>>>> I think we need to change this into a generator, rather than returning
>>>>>> the whole list.
>>>> I don't think so; I think show_logs should grab the first n revisions
>>>> explicitly, because it knows which revisions it wants first, and how
>>>> many it needs.  For example, your proposal would pessimize log --forward
>>>> and other operations that either:
>>>>
>>>> a) need all revisions in order to start work
>>>> b) need the revisions in forward order.
>>>
>>> For (b) you just request them in forward order, rather than requesting
>>> them in reverse order.
>>> And for (a) you can just assume that the lower level is already trying
>>> to read them as fast as it can, so it doesn't really matter.
> 
> For a), reading them in exponentially increasing sizes is slower than
> reading them all at once.

I would guess the difference is quite small. The real win is when
reading backwards, but for reading forwards, I would imaging that you
save very little. As long as you are letting the system read ahead, your
performance should be very close to the ideal.

> 
>>>> If we choose to implement a generator also, we can implement it on top
>>>> of get_revisions.  So we might define the generator in show_log.  That
>>>> would be pretty nice.
>>>
>>> I really don't think the top level has enough information to know what
>>> the optimum read ordering is. Which is why having a generator interface
>>> where you can make a big request, and have the lower levels optimize it.
> 
> I agree that the top level doesn't know the optimum ordering.  What it
> does know is what the minimum request size should be.
> 
> LocalTransport.readv will sort the requests into the optimum order, and
> I think that's the right layer to do it at.

readv doesn't have any idea about parallel building of texts.
If I say give me the texts 1-20, and they are all deltas based on the
first one, you can read 1, yield it. apply the delta for 2, yield it.
apply the delta for 3, yield it, etc.

If I request '20-1', then you can do the same thing, but buffer the
texts until you are ready to yield them. This has a memory consumption
issue, so the real win would be to convert the forward deltas into
reverse deltas as you go, build up #20, and then apply the reverse
deltas as you go backwards.

This doesn't matter for revisions, since they are all full-texts. But it
does matter for all the other file texts we might want to grab. And I
think it would be nice if we could make the changes reasonably generic.
In the future we might store delta-compressed revision entries. You can
actually get quite a bit of size savings if we format the entries properly.
I played around with that in my sax branches. And basically it uses the
fact that the same committer usually commits several times in a row.
Especially along left-hand-side parents. (And the format number doesn't
change, etc.)

> 
>>> Well, right now log -v has to get the delta between two revisions. I'm
>>> pretty sure it has to go through inventories to do so (though it might
>>> only be going into file_ids_affected_by). That still goes down to the
>>> delta inventory texts, though.
> 
> If you have an ancestry graph for files, and another for revisions, I
> don't think it should be necessary to read the inventories at all.
> Perhaps I should write that as part of my optimization.

That requires reading all of the indexes for all files, which might be
touched by those revisions. Hard to say what would be the win.

> 
> Did you mean to reply to the list?

Yes I did. I just sent it off.

> 
> Aaron

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060612/5611354d/attachment.pgp