[MERGE] Refactor diffing
Aaron Bentley
aaron.bentley at utoronto.ca
Sat Nov 24 17:05:12 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> On Fri, 2007-11-23 at 18:19 -0500, Aaron Bentley wrote:
>> Unfortunately, Dirstate.inventory is also used by DirState.kind and
>> DirState.get_file_lines.
>
> def kind(self, file_id,
> _kind_map=dirstate.DirState._minikind_to_kind):
Thanks again.
Will you work on getting these DirstateRevisionTree improvements merged
or should I?
With these applied, I do show some improvement in the new code. Best
times out of 10:
Old:
real 0m0.794s
user 0m0.644s
sys 0m0.116s
New:
real 0m0.477s
user 0m0.368s
sys 0m0.100s
Or 1.7x as fast. (most times for the new code were more like .6,
though. It varied a lot.)
For this workload (1 modified file, Bazaar-sized tree), an even bigger
win would be to accelerate get_file_mtime on DirStateRevisionTree (i.e.
by storing or caching the mtime in the dirstate so we don't have to read
a revision per file). That revision access is 35% of runtime according
to lsprof; _iter_changes itself only takes 23%. unified_diff is 0.69%
I also ran tests on a tree with 249 modified files (best times out of 3):
Old:
real 0m2.804s
user 0m2.552s
sys 0m0.140s
New:
real 0m2.652s
user 0m2.420s
sys 0m0.172s
Or 1.05x as fast.
For this workload, 60.0% of time is spent retrieving files from knits.
This suggests that we need
1. to get an accelerated implementation of iter_files_bytes for packs
2. to use such implementation to retrieve the texts and diff them in the
order they arrive.
For this workload, get_file_mtime cost 12.1%, due to retrieving 250
revisions.
_iter_changes is 2.17%. unified_diff is 4.07%. Those constitute the
"real work" that diff does, but they aren't even close to being limiting
factors.
Also, I noticed that dirstate uses "xxxxxxxx..." for the last-revision
of the working tree. Something ending with a colon would probably be
better, because afaik, "xxxxx..." is actually a valid revision-id. I
think a great choice would be "current:", which is a reserved id and
actually means "the current state of the working tree".
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHSFnI0F+nu1YWqI0RArBxAJ4gA5J1MScOvttNHtfoK1Z5Z36SjACePqc+
+gfcBgA5aYlqFvYPApW6ARQ=
=CzVE
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list