Bazaar NG performance on large repositories

John Arbash Meinel john at arbash-meinel.com
Mon Oct 30 18:32:21 GMT 2006


Nicholas Allen wrote:
> John Arbash Meinel wrote:
>> Nicholas Allen wrote:
>>  
>>>>> bzr log some-file
>>>>> This command took a very long time. In fact, I gave up waiting for
>>>>> it to
>>>>> complete. No output was seen on the terminal at all - even after 5
>>>>> minutes. I think in its current state this would be completely
>>>>> unusable
>>>>> for us. I hope that bzr will see some performance improvements here.
>>>>>             
>>>> log some-file reads the inventory of each commit, so it scales with
>>>> tree
>>>> size.  That's a limit of the current implementation, though.  We can
>>>> look at each file to see what revisions modified it, though I'm not
>>>> sure
>>>> whether that will detect all kinds of changes.
>>>>         
>>> Is that something that is hard to change? Would it not be possible to
>>> have a file somewhere (in .bzr) for each file under version control that
>>> lists the revisions that modified it? These files could be calculated
>>> once after an initial pull and then updated on each commit. Then a log
>>> for a single command would be very fast. The files could be updated the
>>> first time a log on a single file is done or always after pulling (but
>>> this would increase the pull time of course).
>>>     
>>
>> As I commented, most of this is already done. We don't record deletes in
>> that index, and we really should.
>>   
> Great! So it sounds pretty easy to make log use this file then. Perhaps
> I can take a look at implementing this - but it would probably take me a
> while to get familiar with the code base first and I am quite busy so
> not sure how long it would take me...

It shouldn't be too difficult. The code is in:

bzrlib/log.py

The specific function is "def _show_log()" which can take a
'specific_fileid' parameter.

If it is not given, we want to show the log for the whole project.

If you do supply it, we *could* change so that instead of iterating over
all revisions, and determining which ones are affected, we just iterate
over the ones in the files index.

It wouldn't be quite as simple as just doing:

if specific_file_id is None:
  which_revs = _enumerate_history(branch)
else:
  which_revs = []

  # Right now, we are missing an API on Repository to get the graph for
  # A single file (or list of files). So we have to go through a private
  # API. This really should be promoted to a public API on Repository
  file_weave = branch.repository._revision_store.get_weave(file_id)
  revisions = file_weave.get_ancestry()

  # Revno's start at 1, lists start at 0
  revno_map = dict((rev, offset+1) for offset, rev in
                   enumerate(branch.revision_history())
  for rev in revisions:
    if rev in revno_map:
      which_revs.append((revno_map[rev], rev))
    else:
      which_revs.append((None, rev))


But that might be a good place to get started.

...

>> It would be nice to figure this out. If it isn't a public project, I
>> would be happy to investigate it with you on IRC at chat.freenode.net
>> #bzr. We can investigate over email, but it is usually much
>> easier/faster on IRC.
> Ok - unfortunately, it is a closed source project so I would not be able
> to share the source code with you. I can't do it today but would be
> happy to offer any assistance to solve the problem. I was using the
> 0.12rc release but through a cygwin terminal so not sure if that had
> something to do with it. I'll send you an email to arrange a time on IRC
> then (I will probably have some time tomorrow)...
> 
> Thanks,
> 
> Nick

Sounds good. We can work that out off list.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061030/1b43c9ba/attachment.pgp 


More information about the bazaar mailing list