Storage requirements of tree transforms

Aaron Bentley aaron.bentley at utoronto.ca
Tue Jan 17 04:11:32 GMT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

I'm trying satisfy three conditions, and I think I can only manage two:
1. tree transforms never take up extra filesystem space while they are
pending.
2. tree transform results are correct when the results of creating a
file depend on the contents of the working tree.
3. when the caller is a merge, it doesn't have to stash away extra
copies of the working tree file.

My original proposal satisfied 1.  I'm now proposing to satisfy 2 and 3.
~ The extra size of a tree undergoing a transform will be the size of all
new contents that are pending.  This will usually be a fraction of the
size of the working tree, but could, in unusual circumstances, exceed it.

Explaining 2
============
A good example of 2 would be the current revert* scheme, in which we do
a three way merge with THIS = working_tree, BASE = working_tree, and
OTHER = target_revision_tree.  By three-way logic, that collapses into
'turn THIS into OTHER'.

But there are more complex examples in which it is not apparent that two
pathnames refer to the same file.  For example, when a tree is visible
on the local FS and also on nfs.

* The tree transform revert does not use merge, and thus does not delete
newly-added files.

Explaining 3
============
I had originally planned that the iterators for new file contents would
be held in memory until halfway through the transform application, and
then applied.

For in the case of merging, then, it would not be known whether the
merge produced textual conflicts until after the transform had been
applied.  After the transform had been applied, the original contents of
the file, needed to produce foo.THIS, would be gone.  So in order to be
able to produce foo.THIS, the merge code would need to stash a copy of
the unaltered file somewhere.

The new plan
============
The transform application uses a temp directory used to hold files
undergoing moves/renames, called ".bzr/limbo".  I propose to stick the
new contents there.  Further, I plan to evaluate the iterators when the
new contents are added.

That way, we can do:
~    ...
~    merged = Merge3Iter(this, base, other)
~    tree_transform.create_file(merged, trans_id)
~    if merged.conflicts:
~        tree_transform.new_file(name+".THIS", parent, this,
~                                executable=executable)
~    ...

Does this sound like a reasonable trade off?  Does anyone actually care
about a small size increase during transform creation/application?

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDzG500F+nu1YWqI0RAuMlAJ48bKGIqJ8tzWP6avuuu/OS2ILsnQCffsRN
WGuo8RkdeCV0yYfGIQv7Jdk=
=UagU
-----END PGP SIGNATURE-----




More information about the bazaar mailing list