[MERGE] optimize annotate

John Arbash Meinel john at arbash-meinel.com
Wed Nov 29 13:26:55 GMT 2006


Aaron Bentley wrote:
> Nicholas Allen wrote:
>>>> As you suspect, it would fill 1000MB of memory.
>>> Did I understand correctly? I'm worried because we have files with
>>> 40,000 revisions and tipical file size might be 50 - 100kb. So in this
>>> case would annotate take up 40,000 * 50 Kb (i.e 2 - 4 Gb) of memory? If
>>> so I would rather annotate take a second longer ;-)
> 
> No, you don't.  He's talking about files.  He's right that we shouldn't
> keep large numbers of file versions in memory.  But that's beside the
> point, because it's not what my change does.
> 
> I'm talking about revisions, not file versions.  Revisions store the
> commit log, the committer name, the time-of-commit, and that sort of
> thing.  They are ~1K in size, no matter how big your files are.  So
> keeping lots and lots of revisions in memory is completely fine.
> 
> Aaron

Also to clarify one more thing. Aaron's patch only loads the revisions
that touch the file, not every possible revision.

For a file with 1000 lines, there can't be more than 1000 revisions
(each line with a different revision that introduced it).

In practice, the number is usually much, much less. Most files are only
modified a few times (<100), regardless of their size. And further we
only load the one that touches the current lines, not all revisions that
ever touched the file.

There are pathological cases like Changelog, or in our case NEWS.

To describe, NEWS has 2235 lines, and a total of 1086 revisions that
modified it. 'bzr annotate' loads 577 revisions. (at maybe 1k per, so
about 0.5MB)

Our largest and most active file is builtins.py with 3022 lines, 113KB,
 and 1125 total revisions changing it, and annotate has to load only 365
revisions.

The patch below will let you investigate how many revisions annotate has
to load, if you are really curious.

=== modified file 'bzrlib/annotate.py'
--- bzrlib/annotate.py  2006-11-28 21:56:47 +0000
+++ bzrlib/annotate.py  2006-11-29 13:21:05 +0000
@@ -65,6 +65,8 @@
                     branch.repository.has_revision(o)]
     revisions = dict((r.revision_id, r) for r in
                      branch.repository.get_revisions(revision_ids))
+    from bzrlib.trace import mutter
+    mutter('loaded %d revisions for file_id: %s', len(revisions), file_id)
     for origin, text in annotations:
         text = text.rstrip('\r\n')
         if origin == last_origin:

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061129/84e659b6/attachment.pgp 


More information about the bazaar mailing list