Rev 4502: Add documentation describing how and why we use inventory deltas, and what can go wrong with them. in http://people.ubuntu.com/~robertc/baz2.0/pending/apply-inventory-delta

Robert Collins robertc at robertcollins.net
Thu Jul 2 08:22:43 BST 2009


At http://people.ubuntu.com/~robertc/baz2.0/pending/apply-inventory-delta

------------------------------------------------------------
revno: 4502
revision-id: robertc at robertcollins.net-20090702072227-a2yzortdrjcnls5c
parent: pqm at pqm.ubuntu.com-20090701190614-ypbryxq02zxbf5kv
committer: Robert Collins <robertc at robertcollins.net>
branch nick: apply-inventory-delta
timestamp: Thu 2009-07-02 17:22:27 +1000
message:
  Add documentation describing how and why we use inventory deltas, and what can go wrong with them.
=== modified file 'bzrlib/inventory.py'
--- a/bzrlib/inventory.py	2009-06-18 18:18:36 +0000
+++ b/bzrlib/inventory.py	2009-07-02 07:22:27 +0000
@@ -1090,6 +1090,9 @@
     def apply_delta(self, delta):
         """Apply a delta to this inventory.
 
+        See the inventory developers documentation for the theory behind
+        inventory deltas.
+
         :param delta: A list of changes to apply. After all the changes are
             applied the final inventory must be internally consistent, but it
             is ok to supply changes which, if only half-applied would have an
@@ -1578,6 +1581,9 @@
         propagate_caches=False):
         """Create a new CHKInventory by applying inventory_delta to this one.
 
+        See the inventory developers documentation for the theory behind
+        inventory deltas.
+
         :param inventory_delta: The inventory delta to apply. See
             Inventory.apply_delta for details.
         :param new_revision_id: The revision id of the resulting CHKInventory.

=== modified file 'bzrlib/mutabletree.py'
--- a/bzrlib/mutabletree.py	2009-06-10 03:56:49 +0000
+++ b/bzrlib/mutabletree.py	2009-07-02 07:22:27 +0000
@@ -523,6 +523,9 @@
         for commit which is not required to handle situations that do not arise
         outside of commit.
 
+        See the inventory developers documentation for the theory behind
+        inventory deltas.
+
         :param new_revid: The new revision id for the trees parent.
         :param delta: An inventory delta (see apply_inventory_delta) describing
             the changes from the current left most parent revision to new_revid.

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2009-06-26 09:24:34 +0000
+++ b/bzrlib/repository.py	2009-07-02 07:22:27 +0000
@@ -1018,6 +1018,9 @@
                                parents, basis_inv=None, propagate_caches=False):
         """Add a new inventory expressed as a delta against another revision.
 
+        See the inventory developers documentation for the theory behind
+        inventory deltas.
+
         :param basis_revision_id: The inventory id the delta was created
             against. (This does not have to be a direct parent.)
         :param delta: The inventory delta (see Inventory.apply_delta for

=== modified file 'doc/developers/inventory.txt'
--- a/doc/developers/inventory.txt	2009-04-04 02:50:01 +0000
+++ b/doc/developers/inventory.txt	2009-07-02 07:22:27 +0000
@@ -498,3 +498,90 @@
 delta, the only other valid fields are OLDPATH and 'file-id'. PARENT_ID is ''
 when a delete has been recorded or when recording a new root entry.
 
+
+Delta consistency
+=================
+
+Inventory deltas and more broadly changes between trees are a significant part
+of bzr's core operations: they are key components in status, diff, commit,
+and merge (although merge uses tree transform, deltas contain the changes that
+are applied to the transform). Our ability to perform a given operation depends
+on us creating consistent deltas between trees. Inconsistent deltas lead to
+errors and bugs, or even just unexpected conflicts.
+
+An inventory delta is a transform to change an inventory A into another
+inventory B (in patch terms its a perfect patch). Sometimes, for instance in a
+regular commit, inventory B is known at the time we create the delta. Other
+times, B is not known because the user is requesting that some parts of the
+second inventory they have are masked out from consideration. When this happens
+we create a delta that when applied to A creates a B we haven't seen in total
+before. In this situation we need to ensure that B will be internally
+consistent. Deltas are unidirectional, a delta(A, B) creates B from A, but
+cannot be used to create A from B.
+
+Deltas are expressed as a list of (oldpath, newpath, fileid, entry) tuples. The
+fileid, entry elements are normative; the old and new paths are strong hints
+but not currently guaranteed to be accurate. (This is a shame and something we
+should tighten up). Deltas are required to list all removals explicitly -
+removing the parent of an entry doesn't remove the entry.
+
+Applying a delta to an inventory consists of:
+ - removing all fileids for which entry is None
+ - adding or replacing all other fileids
+ - detecting consistency errors
+
+An interesting aspect of delta inconsistencies is when we notice them:
+ - Silent errors which our application logic misses
+ - Visible errors we catch during application, so bad data isn't stored in
+   the system.
+
+The minimum safe level for our application logic would be to catch all errors
+during application. Making generation never generate inconsistent deltas is
+a seperate but necessary condition for robust code.
+
+An inconsistent delta is one which:
+ - after application to an inventory the inventory is an impossible state.
+ - has the same fileid, or oldpath(not-None), or newpath(not-None) multiple
+   times.
+ - has a fileid field different to the entry.fileid in the same item in the
+   delta.
+
+Forms of inventory inconsistency deltas can carry/cause:
+ - An entry newly introduced to a path without also removing or relocating any
+   existing entry at that path. (Duplicate paths)
+ - An entry whose parent id isn't present in the tree. (Missing parent).
+ - Having oldpath or newpath not be actual original path or resulting path.
+   (Wrong path)
+ - An entry whose parent is not a directory. (Under non-directory).
+ - An entry that is internally inconsistent.
+
+Known causes of inconsistency:
+ - A 'new' entry which the inventory already has - when this is a directory
+   even arbitrary file ids under the 'new' entry are more likely to collide on
+   paths.
+ - Removing a directory without recursively removing its children - causes
+   Missing parent.
+ - Recording a change to an entry without including all changed entries found
+   following its parents up to and includin the root - can cause duplicate
+   paths, missing parents, wrong path, under non-directory.
+
+Avoiding inconsistent deltas
+----------------------------
+
+The simplest thing is to never create partial deltas, as it is trivial to
+be consistent when all data is examined every time. However users sometimes
+want to specify a subset of the changes in their tree when they do an operation
+which needs to create a delta - such as commit.
+
+We have a choice about handling user requests that can generate inconsistent
+deltas. We can alter or interpret the request in such a way that the delta will
+be consistent, but perhaps larger than the user had intended. Or we can
+identify problematic situations and abort, specifying to the user why we have
+aborted and likely things they can do to make their request generate a
+consistent delta.
+
+Currently we attempt to expand/interpret the request so that the user is not
+required to understand all the internal constraints of the system: if they
+request 'foo/bar' we automatically include foo. This works to an extent but on
+review we are creating inconsistent deltas by the way we do this. We need to
+avoid all the known causes of inconsistency in our delta creation logic.




More information about the bazaar-commits mailing list