Rev 2583: (robertc) status analysis documentation. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Wed Jul 4 10:04:30 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2583
revision-id: pqm at pqm.ubuntu.com-20070704090428-7up5xgpxdtt234u3
parent: pqm at pqm.ubuntu.com-20070704083613-v2o3pj6chp4hiqky
parent: robertc at robertcollins.net-20070702071125-cfv9utb636hn3kg3
committer: Canonical.com Patch Queue Manager<pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2007-07-04 10:04:28 +0100
message:
  (robertc) status analysis documentation.
added:
  doc/developers/status.txt      status.txt-20070702023117-6xss29lx170qndwr-1
modified:
  doc/developers/performance-roadmap.txt performanceroadmap.t-20070507174912-mwv3xv517cs4sisd-2
  doc/developers/performance.dot performance.dot-20070527173558-rqaqxn1al7vzgcto-3
    ------------------------------------------------------------
    revno: 2566.2.3
    merged: robertc at robertcollins.net-20070702071125-cfv9utb636hn3kg3
    parent: robertc at robertcollins.net-20070702051812-9ztlxv5nw1w3v10m
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: repository
    timestamp: Mon 2007-07-02 17:11:25 +1000
    message:
      Clearer description of locality of reference tuning.
    ------------------------------------------------------------
    revno: 2566.2.2
    merged: robertc at robertcollins.net-20070702051812-9ztlxv5nw1w3v10m
    parent: robertc at robertcollins.net-20070702023128-5j3rtf8bx5v49tbi
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: repository
    timestamp: Mon 2007-07-02 15:18:12 +1000
    message:
      Review feedback.
    ------------------------------------------------------------
    revno: 2566.2.1
    merged: robertc at robertcollins.net-20070702023128-5j3rtf8bx5v49tbi
    parent: pqm at pqm.ubuntu.com-20070629150144-xoeghcfb52pit8tv
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: repository
    timestamp: Mon 2007-07-02 12:31:28 +1000
    message:
      Status analysis.
=== added file 'doc/developers/status.txt'
--- a/doc/developers/status.txt	1970-01-01 00:00:00 +0000
+++ b/doc/developers/status.txt	2007-07-02 07:11:25 +0000
@@ -0,0 +1,100 @@
+The status command
+==================
+
+The status command is used to provide a pithy listing of the changes between
+two trees. Its common case is between the working tree and the basis tree, but
+it can be used between any two arbitrary trees.
+
+.. contents:: :local:
+
+UI Overview
+-----------
+
+Status shows several things in parallel (for the paths the user supplied mapped
+across the from and to tree, and any pending merges in the to tree).
+
+1. Single line summary of all new revisions - the pending merges and their
+   parents recursively.
+2. Changes to the tree shape - adds/deletes/renames.
+3. Changes to versioned content - kind changes and content changes.
+4. Unknown files in the to tree.
+5. Files with conflicts in the to tree.
+
+
+Ideal work for working tree to historical status
+------------------------------------------------
+
+We need to do the following things at a minimum:
+
+1. Determine new revisions - the pending merges and history.
+
+1. Retrieve the first line of the commit message for the new revisions.
+
+1. Determine the tree differences between the two trees using the users paths
+   to limit the scope, and resolving paths in the trees for any pending merges.
+   We arguably don't care about tracking metadata for this - only the value of
+   the tree the user commited.
+
+1. The entire contents of directories which are versioned when showing
+   unknowns.
+
+1. Whether a given unversioned path is unknown or ignored.
+
+1. The list conflicted paths in the tree (which match the users path
+   selection?)
+
+
+Expanding on the tree difference case we will need to:
+
+1. Stat every path in working trees which is included by the users path
+   selection to ascertain kind and execute bit.
+
+1. For paths which have the same kind in both trees and have content, read
+   that content or otherwise determine whether the content has changed. Using
+   our hash cache from the dirstate allows us to avoid reading the file in the
+   common case. There are alternative ways to achieve this - we could record
+   a pointer to a revision which contained this fileid with the current content
+   rather than storing the content's hash; but this seems to be a pointless 
+   double-indirection unless we save enough storage in the working tree. A
+   variation of this is to not record an explicit pointer but instead
+   define an implicit pointer as being to the left-hand-parent tree.
+
+
+Locality of reference
+---------------------
+
+- We should stat files in the same directory without reading or statting
+  files in other directories. That is we should do all the statting we
+  intend to do within a given directory without doing any other IO, to
+  minimise pressure on the drive heads to seek.
+
+- We should read files in the same directory without reading or writing
+  files in other directories - and note this is separate to statting (file
+  data is usually physically disjoint to metadata).
+
+
+Scaling observations
+--------------------
+
+- The stat operation clearly involves every versioned path in the common case.
+- Expanding out the users path selection in a naive manner involves reading the
+  entire tree shape information for both trees and for all pending-merge trees.
+  (Dirstate makes this tolerably cheap for now, but we're still scaling
+  extra-linearly.)
+- The amount of effort required to generate tree differences between the
+  working tree and the basis tree is interesting: with a tree-like structure
+  and some generatable name for child nodes we use the working tree data to
+  eliminate accessing or considering subtrees regardless of historival
+  age. However, if we have had to access the historical tree shape to
+  perform path selection this rather reduces the win we can obtain here.
+  If we can cause path expansion to not require historical shape access
+  (perhaps by performing the expansion after calculating the tree
+  difference for the top level of the selected path) then we can gain a
+  larger win. This strongly suggests that path expansion and tree
+  difference generation should be linked in terms of API.
+ 
+
+
+..
+   vim: ft=rst tw=74 ai
+

=== modified file 'doc/developers/performance-roadmap.txt'
--- a/doc/developers/performance-roadmap.txt	2007-06-28 02:48:12 +0000
+++ b/doc/developers/performance-roadmap.txt	2007-07-02 02:31:28 +0000
@@ -42,6 +42,8 @@
 
 .. include:: revert.txt
 
+.. include:: status.txt
+
 .. include:: annotate.txt
 
 .. include:: merge-scaling.txt

=== modified file 'doc/developers/performance.dot'
--- a/doc/developers/performance.dot	2007-06-28 02:53:51 +0000
+++ b/doc/developers/performance.dot	2007-07-02 05:18:12 +0000
@@ -12,17 +12,17 @@
   gc_analysis[label="Work required analysis for gc"];
   revert_analysis[label="Work required analysis for revert"];
   revert_path_analysis[label="Work required analysis for revert of selected paths"];
+  status_analysis[label="Work required analysis for status"];
+  uncommit_analysis[label="Work required analysis for uncommit"];
   wt_disk_order[label="Working Tree disk ordering\n6-8 weeks"];
 
   /* uncompleted node list - add new tasks here */
   node[color="blue"];
-  status_analysis[label="Work required analysis for status"];
   log_analysis[label="Work required analysis for log"];
   log_path_analysis[label="Work required analysis for log of selected paths."];
   diff_analysis[label="Work required analysis for diff"];
   diff_path_analysis[label="Work required analysis for diff of selected paths"];
   merge_analysis[label="Work required analysis for merge"];
-  uncommit_analysis[label="Work required analysis for uncommit"];
   missing_analysis[label="Work required analysis for missing"];
   update_analysis[label="Work required analysis for update"];
   cbranch_analysis[label="Work required analysis for cbranch"];




More information about the bazaar-commits mailing list