Rev 4505: (robertc) Document how, in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Fri Jul 3 00:10:55 BST 2009
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 4505 [merge]
revision-id: pqm at pqm.ubuntu.com-20090702231053-955txia3151h0z2o
parent: pqm at pqm.ubuntu.com-20090702113738-5qda6d3y80z4l3o5
parent: robertc at robertcollins.net-20090702072227-a2yzortdrjcnls5c
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2009-07-03 00:10:53 +0100
message:
(robertc) Document how,
why and issues with inventory deltas. (Robert Collins)
modified:
bzrlib/inventory.py inventory.py-20050309040759-6648b84ca2005b37
bzrlib/mutabletree.py mutabletree.py-20060906023413-4wlkalbdpsxi2r4y-2
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
doc/developers/inventory.txt inventory.txt-20080103013957-opkrhxy6lmywmx4i-1
=== modified file 'bzrlib/inventory.py'
--- a/bzrlib/inventory.py 2009-07-01 10:50:24 +0000
+++ b/bzrlib/inventory.py 2009-07-02 23:10:53 +0000
@@ -1088,6 +1088,9 @@
def apply_delta(self, delta):
"""Apply a delta to this inventory.
+ See the inventory developers documentation for the theory behind
+ inventory deltas.
+
:param delta: A list of changes to apply. After all the changes are
applied the final inventory must be internally consistent, but it
is ok to supply changes which, if only half-applied would have an
@@ -1576,6 +1579,9 @@
propagate_caches=False):
"""Create a new CHKInventory by applying inventory_delta to this one.
+ See the inventory developers documentation for the theory behind
+ inventory deltas.
+
:param inventory_delta: The inventory delta to apply. See
Inventory.apply_delta for details.
:param new_revision_id: The revision id of the resulting CHKInventory.
=== modified file 'bzrlib/mutabletree.py'
--- a/bzrlib/mutabletree.py 2009-06-10 03:56:49 +0000
+++ b/bzrlib/mutabletree.py 2009-07-02 07:22:27 +0000
@@ -523,6 +523,9 @@
for commit which is not required to handle situations that do not arise
outside of commit.
+ See the inventory developers documentation for the theory behind
+ inventory deltas.
+
:param new_revid: The new revision id for the trees parent.
:param delta: An inventory delta (see apply_inventory_delta) describing
the changes from the current left most parent revision to new_revid.
=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py 2009-07-01 10:28:44 +0000
+++ b/bzrlib/repository.py 2009-07-02 23:10:53 +0000
@@ -1015,6 +1015,9 @@
parents, basis_inv=None, propagate_caches=False):
"""Add a new inventory expressed as a delta against another revision.
+ See the inventory developers documentation for the theory behind
+ inventory deltas.
+
:param basis_revision_id: The inventory id the delta was created
against. (This does not have to be a direct parent.)
:param delta: The inventory delta (see Inventory.apply_delta for
=== modified file 'doc/developers/inventory.txt'
--- a/doc/developers/inventory.txt 2009-04-04 02:50:01 +0000
+++ b/doc/developers/inventory.txt 2009-07-02 07:22:27 +0000
@@ -498,3 +498,90 @@
delta, the only other valid fields are OLDPATH and 'file-id'. PARENT_ID is ''
when a delete has been recorded or when recording a new root entry.
+
+Delta consistency
+=================
+
+Inventory deltas and more broadly changes between trees are a significant part
+of bzr's core operations: they are key components in status, diff, commit,
+and merge (although merge uses tree transform, deltas contain the changes that
+are applied to the transform). Our ability to perform a given operation depends
+on us creating consistent deltas between trees. Inconsistent deltas lead to
+errors and bugs, or even just unexpected conflicts.
+
+An inventory delta is a transform to change an inventory A into another
+inventory B (in patch terms its a perfect patch). Sometimes, for instance in a
+regular commit, inventory B is known at the time we create the delta. Other
+times, B is not known because the user is requesting that some parts of the
+second inventory they have are masked out from consideration. When this happens
+we create a delta that when applied to A creates a B we haven't seen in total
+before. In this situation we need to ensure that B will be internally
+consistent. Deltas are unidirectional, a delta(A, B) creates B from A, but
+cannot be used to create A from B.
+
+Deltas are expressed as a list of (oldpath, newpath, fileid, entry) tuples. The
+fileid, entry elements are normative; the old and new paths are strong hints
+but not currently guaranteed to be accurate. (This is a shame and something we
+should tighten up). Deltas are required to list all removals explicitly -
+removing the parent of an entry doesn't remove the entry.
+
+Applying a delta to an inventory consists of:
+ - removing all fileids for which entry is None
+ - adding or replacing all other fileids
+ - detecting consistency errors
+
+An interesting aspect of delta inconsistencies is when we notice them:
+ - Silent errors which our application logic misses
+ - Visible errors we catch during application, so bad data isn't stored in
+ the system.
+
+The minimum safe level for our application logic would be to catch all errors
+during application. Making generation never generate inconsistent deltas is
+a seperate but necessary condition for robust code.
+
+An inconsistent delta is one which:
+ - after application to an inventory the inventory is an impossible state.
+ - has the same fileid, or oldpath(not-None), or newpath(not-None) multiple
+ times.
+ - has a fileid field different to the entry.fileid in the same item in the
+ delta.
+
+Forms of inventory inconsistency deltas can carry/cause:
+ - An entry newly introduced to a path without also removing or relocating any
+ existing entry at that path. (Duplicate paths)
+ - An entry whose parent id isn't present in the tree. (Missing parent).
+ - Having oldpath or newpath not be actual original path or resulting path.
+ (Wrong path)
+ - An entry whose parent is not a directory. (Under non-directory).
+ - An entry that is internally inconsistent.
+
+Known causes of inconsistency:
+ - A 'new' entry which the inventory already has - when this is a directory
+ even arbitrary file ids under the 'new' entry are more likely to collide on
+ paths.
+ - Removing a directory without recursively removing its children - causes
+ Missing parent.
+ - Recording a change to an entry without including all changed entries found
+ following its parents up to and includin the root - can cause duplicate
+ paths, missing parents, wrong path, under non-directory.
+
+Avoiding inconsistent deltas
+----------------------------
+
+The simplest thing is to never create partial deltas, as it is trivial to
+be consistent when all data is examined every time. However users sometimes
+want to specify a subset of the changes in their tree when they do an operation
+which needs to create a delta - such as commit.
+
+We have a choice about handling user requests that can generate inconsistent
+deltas. We can alter or interpret the request in such a way that the delta will
+be consistent, but perhaps larger than the user had intended. Or we can
+identify problematic situations and abort, specifying to the user why we have
+aborted and likely things they can do to make their request generate a
+consistent delta.
+
+Currently we attempt to expand/interpret the request so that the user is not
+required to understand all the internal constraints of the system: if they
+request 'foo/bar' we automatically include foo. This works to an extent but on
+review we are creating inconsistent deltas by the way we do this. We need to
+avoid all the known causes of inconsistency in our delta creation logic.
More information about the bazaar-commits
mailing list