Rev 2728: (mbp, jameinel) notes on directory last-modified in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Mon Aug 20 07:10:41 BST 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 2728
revision-id: pqm at pqm.ubuntu.com-20070820061034-4y2ywj61q0on3mcj
parent: pqm at pqm.ubuntu.com-20070820045741-zojy0q9vgi0d860r
parent: mbp at sourcefrog.net-20070806222258-d999nnmt5fl74o0t
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Mon 2007-08-20 07:10:34 +0100
message:
(mbp,jameinel) notes on directory last-modified
added:
doc/developers/last-modified.txt lastmodified.txt-20070806222243-df50y5fi7n85vnob-1
------------------------------------------------------------
revno: 2664.9.1
merged: mbp at sourcefrog.net-20070806222258-d999nnmt5fl74o0t
parent: pqm at pqm.ubuntu.com-20070731122244-f1jemfecukeevugw
committer: Martin Pool <mbp at sourcefrog.net>
branch nick: doc-last-modified
timestamp: Tue 2007-08-07 08:22:58 +1000
message:
Add notes on file and directory last modified (with John)
=== added file 'doc/developers/last-modified.txt'
--- a/doc/developers/last-modified.txt 1970-01-01 00:00:00 +0000
+++ b/doc/developers/last-modified.txt 2007-08-06 22:22:58 +0000
@@ -0,0 +1,176 @@
+==============================
+Computing last_modified values
+==============================
+
+Introduction
+------------
+
+Bazaar (through at least 0.19) computes a ``last_modified``
+attribute for all inventory entries and stores it at commit time.
+This is the ``revision_id`` that last changed or merged the file. It is
+used in knit and weave repositories to look up the file text, and to index
+into the file graph. It's also used to determine which revisions of the
+file text to pull during ``fetch``.
+
+This data is not natively stored by most other systems so we need to
+synthesize it during conversion.
+
+This is a case of non-core data that we might wish to treat as cached,
+rather than always stored.
+
+Definition
+----------
+
+Take the set of all "heads": all the versions of these files in parent
+trees.
+
+Reduce the heads by eliminating any whose last_modified is an ancestor of
+the last_modified of any other head.
+
+If there is still more than one head, a new last_modified is assigned.
+This points to the merge point in the file graph.
+
+If the file text and properties are the same as the sole remaining head,
+its last_modified is inherited. Property changes include executable bit,
+filename, and containing directory.
+
+Otherwise, a new last_modified is used.
+
+(This is meant to be the simplest statement, but it may not be the most
+efficient algorithm; anything that gives equivalent results can be used.)
+
+
+Generation in commit
+--------------------
+
+Commit and converters both need this when writing into Bazaar native
+formats.
+
+This is an O(tree) operation because it needs to check for files with
+multiple heads. It could be reduced to O(changed_or_merged_files) if that
+was faster to determine. So it needs to be fast.
+
+For the single-parent commit case, we just need to determine which files have
+changed compared to the parent. If the file was changed, it gets the
+revision id of the new revision; otherwise it inherits the value from the
+parent tree.
+
+In the multi-parent commit case (commit of a merge), it can take the value
+from any of the parent trees, or of the new revision.
+
+Commit in a dirstate tree should be able to do this more easily by looking
+at a row of the dirstate to get the per-file parents. It still needs to
+look at the revision or file graph information to work out whether heads can be
+eliminated as previously merged. At the moment ``find_previous_heads`` works on
+inventories, so needs to spend considerable effort building whole
+inventories, including files that are not modified or merged. (Called
+from ``record_entry_contents``.) It might be better to have the commit
+builder pass in the per-entry parents so that dirstate can generate just
+those that are necessary. (See also the spec for
+``iter_changes_multiple_parents``.)
+
+If merge used a per-file graph then it would know when one version fully
+supersedes another, and it could emit only a single parent. Merge could
+in fact do this even when not using per-file graphs. In the current
+dirstate format we need to store the full data for all trees because they
+can be extracted from the dirstate, but it could mark some parents as
+already merged.
+
+Alternatively, we could change the dirstate to include
+only the base and current trees, and cache the merged-in parents
+elsewhere.
+
+(Offtopic other dirstate changes: we could also omit the working-copy
+hash, and just have a stat-fingerprint of when it was last known equal to
+the basis revision. That reduces the amount of data stored and possibly
+makes it simpler to update, and shouldn't penalize common cases.)
+
+
+Generation during conversion
+----------------------------
+
+Accessing a foreign branch requires synthesizing this information.
+If last_modified is removed from a future bzr version, we will also need
+to synthesize it to pull back to earlier formats.
+
+Because last_modified is not natively stored in the foreign branches, we
+want to take advantage of any conversion we've already done, so that we
+don't need to recursively generate them on every access. We'd
+prefer to find a revision that's already converted to a Bazaar inventory
+within another related repository, such as the target of a conversion.
+
+
+Avoiding last_modified
+----------------------
+
+last_modified is potentially expensive to determine and we may not want to
+store it in inventories in future. Therefore we should use it only when
+necessary:
+
+* When writing out an inventory format that includes it.
+
+* In Bazaar formats that use it as a key for the file text or file
+ ancestry. This should be hidden behind the Repository/RevisionTree
+ interface.
+
+* When a user operation specifically requires the last_modified (e.g.
+ hypothetical annotate directory).
+
+We already do this in most cases.
+
+
+Compared to annotate
+--------------------
+
+
+Use cases
+---------
+
+Cases to test
+-------------
+
+1. Single parent, unmodified file
+2. Single parent, modified file
+3. Two parents, one descended from the other, modified in one parent only
+4. Two parents, one descended from the other, modified in one parent only,
+ but also modified locally.
+5. Two parents, not descended from each other, modified in one parent only.
+6. Two parents, not descended from each other, modified in one parent only,
+ but also modified locally.
+7. Two parents, modified in both to different values.
+8. Two parents, modified in both to the same value.
+9. Two parents, modified in both, and reverted in both back to the
+ original text.
+10. Three parents, modified in only one
+11. Three parents, modified in only one, also modified locally.
+12. Three parents, modified in 2
+13. Three parents, modified in 2, and locally.
+14. Three parents, modified in 2, but one is a descendant of the other.
+
+
+
+Performance considerations
+--------------------------
+
+Often we'll want the last_modified information for multiple files, perhaps
+everything in a directory or in a whole tree. It may be more efficient
+for the api to accommodate this. Often the last_modified will be similar
+for multiple files, and if we process them all at once we can avoid some
+repeated work in calculating their heads.
+
+
+Open questions
+--------------
+
+* How does caching ``find_heads`` interact with cherry-picks?
+
+
+
+Possible structure
+==================
+
+For a single file, if I am different from all parents, 'new'. (Do not need
+to evaluate last modified).
+
+..
+ vim: ft=rst tw=74
More information about the bazaar-commits
mailing list