[RFC] Custom diff algorithm for Inventories
Aaron Bentley
aaron.bentley at utoronto.ca
Sun Dec 3 21:34:52 GMT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> Right now, the bottlenecks are the time inserting into knits, and
> handling the inventory. (Serializing it to XML, diffing it against the
> previous XML, and inserting it into the inventory.knit).
>
> My feeling is that the best way to handle the inventory is to actually
> split it up, which is what we've discussed for a while, and is what I'll
> be working on. But I was thinking there might be a short term improvement.
At first I thought you were looking at ways to optimize tree comparison.
This wouldn't really help there, except maybe for comparing
RevisionTrees. You're actually trying to optimize knit creation, right?
I agree with your notion that diffing sorted data should be cheaper than
diffing unsorted data. I think it's O(n) complexity.
The algorithm I would use is
1. read a line from text A into current-a
2. read a line from text B into current-b
3. if current-a == current-b:
current-a is common. Goto 1
5. compare the lines
6. if current-a > current-b:
current-b is unique to text B. read a line from text B into
current-b. goto 3.
7. if current-b > current-a:
current-a is unique to text A. read a line from text A into
current-a. goto 3.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFc0L70F+nu1YWqI0RAg/oAJ9deLXEw+o9L9n1TbCUPtrFPZhI8QCfcsCb
mqHms5gvNSk95gqDrA1GSbA=
=cC2x
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list