[MERGE] show_log with get_revisions

Mon Jun 12 18:57:25 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
>>This is the part where we disagree.  I think bzr log should know "First,
>>I want these 5 revisions.  Then the next 50 revisions.  Then the next
>>500 revisions."  That information can be encapsulated in a generator,
>>defined in show_log.  The reason is that only bzr log can know how many
>>revisions are needed to fill a screen (this varies according to log
>>formatter and screen size).
> 
> 
> I kind of agree with you. But at the same time, if you give the lower
> levels more flexibility, they can help you a lot more than if you force
> them to read exactly the way you request.

I'm not preventing the lower level from reading the requested revisions
whaterver way is optimal.  It just needs to return them in the requested
order.  Returning them out-of-order isn't useful-- it just means you
have to fix the order at the top level.

>>I agree that the top level doesn't know the optimum ordering.  What it
>>does know is what the minimum request size should be.
>>
>>LocalTransport.readv will sort the requests into the optimum order, and
>>I think that's the right layer to do it at.
> 
> 
> readv doesn't have any idea about parallel building of texts.

In order to avoid opening and closing the file repeatedly, we must teach
KnitVersionedFile to issue a single readv to generate multiple texts.
So while readv doesn't know about building texts in parallel, it will
know about reordering the requests by offset.

> If I say give me the texts 1-20, and they are all deltas based on the
> first one, you can read 1, yield it. apply the delta for 2, yield it.
> apply the delta for 3, yield it, etc.

That is not performant, because it requires issuing multiple readv
requests.  But it's also more complicated than this, because 3 may be a
direct descendant of 1.

> If I request '20-1', then you can do the same thing, but buffer the
> texts until you are ready to yield them.

It's not much of a win, because issuing multiple readv requests means
opening a file, seeking to the offset, and closing it 20 times.  I've
measured generating revisions in forward order, and it has no
performance advantage.  I assume this was because of the multiple readvs.

 This has a memory consumption
> issue, so the real win would be to convert the forward deltas into
> reverse deltas as you go, build up #20, and then apply the reverse
> deltas as you go backwards.

Knit texts are already cached.  As I understand it, if building 2
requires building 1, then 1 will be cached as a result of requesting 2.

>>>>Well, right now log -v has to get the delta between two revisions.

>>If you have an ancestry graph for files, and another for revisions, I
>>don't think it should be necessary to read the inventories at all.
>>Perhaps I should write that as part of my optimization.
> 
> 
> That requires reading all of the indexes for all files, which might be
> touched by those revisions. Hard to say what would be the win.

True dat.  I was thinking of the log FILENAME case.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEjasF0F+nu1YWqI0RAkdXAJ9aB8vOZ3Ttm81EAcKu7KU3SwYXkQCggP/0
XCr9prI2UIZ/kMSaKhWoTII=
=OA5d
-----END PGP SIGNATURE-----