Storage requirements of tree transforms

John A Meinel john at arbash-meinel.com
Tue Jan 17 04:25:51 GMT 2006


Aaron Bentley wrote:
> Hi all,
> 
> I'm trying satisfy three conditions, and I think I can only manage two:
> 1. tree transforms never take up extra filesystem space while they are
> pending.
> 2. tree transform results are correct when the results of creating a
> file depend on the contents of the working tree.
> 3. when the caller is a merge, it doesn't have to stash away extra
> copies of the working tree file.
> 
> My original proposal satisfied 1.  I'm now proposing to satisfy 2 and 3.
> ~ The extra size of a tree undergoing a transform will be the size of all
> new contents that are pending.  This will usually be a fraction of the
> size of the working tree, but could, in unusual circumstances, exceed it.
> 
> Explaining 2
> ============
> A good example of 2 would be the current revert* scheme, in which we do
> a three way merge with THIS = working_tree, BASE = working_tree, and
> OTHER = target_revision_tree.  By three-way logic, that collapses into
> 'turn THIS into OTHER'.
> 
> But there are more complex examples in which it is not apparent that two
> pathnames refer to the same file.  For example, when a tree is visible
> on the local FS and also on nfs.
> 
> * The tree transform revert does not use merge, and thus does not delete
> newly-added files.
> 
> Explaining 3
> ============
> I had originally planned that the iterators for new file contents would
> be held in memory until halfway through the transform application, and
> then applied.
> 
> For in the case of merging, then, it would not be known whether the
> merge produced textual conflicts until after the transform had been
> applied.  After the transform had been applied, the original contents of
> the file, needed to produce foo.THIS, would be gone.  So in order to be
> able to produce foo.THIS, the merge code would need to stash a copy of
> the unaltered file somewhere.
> 
> The new plan
> ============
> The transform application uses a temp directory used to hold files
> undergoing moves/renames, called ".bzr/limbo".  I propose to stick the
> new contents there.  Further, I plan to evaluate the iterators when the
> new contents are added.
> 
> That way, we can do:
> ~    ...
> ~    merged = Merge3Iter(this, base, other)
> ~    tree_transform.create_file(merged, trans_id)
> ~    if merged.conflicts:
> ~        tree_transform.new_file(name+".THIS", parent, this,
> ~                                executable=executable)
> ~    ...
> 
> Does this sound like a reasonable trade off?  Does anyone actually care
> about a small size increase during transform creation/application?
> 
> Aaron


We've lived with the current code, which I don't believe tries to
minimize the size usage.

I think the biggest thing that people care about is speed. Taking space
does take up some time (since you are constraint by write/read speed of
a hard drive).

While it would be nice to be size stingy, it is better to be faster.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060116/3d6325fe/attachment.pgp 


More information about the bazaar mailing list