[MERGE] show_log with get_revisions

Mon Jun 12 18:13:27 BST 2006

Aaron Bentley wrote:
> John Arbash Meinel wrote:

...

> 
> I thought we were moving in that direction.  It seems Robert wants to
> address per-item locking by having a different method, so I can revert
> it.  There are lots of places where we're locking both branch and
> repository-- switching to branch.lock_ will make things cleaner.

For now, remove the repository.lock_read(), and then we can look into
doing it later.

> 
>>> === modified file bzrlib/store/revision/__init__.py
>>> --- bzrlib/store/revision/__init__.py
>>> +++ bzrlib/store/revision/__init__.py
>>> @@ -108,6 +108,10 @@
>>>
>>>      def get_revision(self, revision_id, transaction):
>>>          """Return the Revision object for a named revision."""
>>> +        return self.get_revisions([revision_id], transaction)[0]
>>> +
>>> +    def get_revisions(self, revision_ids, transaction):
>>> +        """Return the Revision objects for a list of named revisions."""
>>>          raise NotImplementedError(self.get_revision)
>>>
>>>      def get_signature_text(self, revision_id, transaction):
>>>
>>>
>>> Shouldn't we add a default implementation which just iterates over the
>>> list and gets them one at a time? That way all Store objects would have
>>> implemented this codepath.
> 
> I have implemented get_revisions for all RevisionStore objects.  I think
> it's simpler to have one codepath where possible, and get_revision is
> easily implemented in terms of get_revisions.

Other than breaking the api, I'm fine with this. I would guess that at a
minimum we should deprecate get_revision(), and have it thunk over to
get_revisions()

> 
>>> I also thought that part of the goal was to be able to get the first 5
>>> revisions quickly, so that 'bzr log' is instantaneous to start up, and
>>> then be able to get the rest of them later.
> 
> I'm not sure exactly where the sweet spot will be.  Right now, I'm
> focusing on general optimization of multi-revision retrieval.  Once
> there's no performance left to squeeze from that, I'll move on to
> optimizing log itself.
> 
> The next thing I expect to do is is implement vf.get_texts(), so that we
> can use readv effectively.  Each invocation of LocalTransport.readv
> opens a file, so changing the request order doesn't help.  Once all
> revisions are generating a combined request, we can sort them by offset,
> so that we don't seek backwards.  Doing that transparently in readv
> means that we don't need to predict the optimal request order higher up.
> 

I agree with both of these, but we still have some performance vs
abstraction issues.
The problem is that it isn't until you get to the knit that you know
what parallel extractions you can do. Now it happens that for revisions,
they are all full-text, so it doesn't matter.

But my feeling is that the higher level is going to know, "I want these
revisions, and in this order", while the lower level is going to know
"reading them in this order is the fastest".

Since 'bzr log' thinks it is going to request all the revisions, but
might actually never get to all of them 'bzr log | less; review, ^C'.

Which is why I think something like 'get_texts()' should be a generator,
since it can reorder the requests for optimum performance, and then
return them in the order that the caller requested them.
And for the 'reorder' step, we can just have a limit on how much it will
reorder the request, possibly doing it exponentially.

I think that is really the optimum path, rather than making the top
level have to know about requesting chunks, you make the part that knows
how to optimize the reads do the optimization, and the top part just
gives it as much information as it can.

>>> I think we need to change this into a generator, rather than returning
>>> the whole list.
> 
> I don't think so; I think show_logs should grab the first n revisions
> explicitly, because it knows which revisions it wants first, and how
> many it needs.  For example, your proposal would pessimize log --forward
> and other operations that either:
> 
> a) need all revisions in order to start work
> b) need the revisions in forward order.

For (b) you just request them in forward order, rather than requesting
them in reverse order.
And for (a) you can just assume that the lower level is already trying
to read them as fast as it can, so it doesn't really matter.

> 
> If we choose to implement a generator also, we can implement it on top
> of get_revisions.  So we might define the generator in show_log.  That
> would be pretty nice.

I really don't think the top level has enough information to know what
the optimum read ordering is. Which is why having a generator interface
where you can make a big request, and have the lower levels optimize it.

Kind of like read ordering in SCSI (or NCQ in SATA).

> 
>>> Also, what we really want is for this stuff to also be available as a
>>> per-file, and for inventories. So that we can do the 'bzr log -v' fast
>>> as well.
> 
> Well, small steps first, I think.  Are we using inventories for log -v
> now?  There was talk of using the file indices for that.
> 
> Aaron

Well, right now log -v has to get the delta between two revisions. I'm
pretty sure it has to go through inventories to do so (though it might
only be going into file_ids_affected_by). That still goes down to the
delta inventory texts, though.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060612/1d88c986/attachment.pgp