Rev 2245: Adding a simple doc describing what I want to do in http://bzr.arbash-meinel.com/branches/bzr/0.15-dev/list_patch

Sat Jan 27 13:42:53 GMT 2007

------------------------------------------------------------
revno: 2245
revision-id: john at arbash-meinel.com-20070127134240-nae88rt1lsm658qn
parent: pqm at pqm.ubuntu.com-20070125051817-53a80525dbdf87b4
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: list_patch
timestamp: Sat 2007-01-27 07:42:40 -0600
message:
  Adding a simple doc describing what I want to do
added:
  design_ideas.txt               design_ideas.txt-20070127134227-65ul2uetpxrem4v8-1
-------------- next part --------------
=== added file 'design_ideas.txt'

--- a/design_ideas.txt	1970-01-01 00:00:00 +0000
+++ b/design_ideas.txt	2007-01-27 13:42:40 +0000
@@ -0,0 +1,101 @@
+
+# Outline of workflow:
+
+1) Figure out all pieces of the .knit that needs to be read
+2) Sort by position in the file
+3) read data in-order, and build up content information for the requested
+   records
+4) build up content 'lazily' to avoid lots of manipulation on large lists
+5) don't cache every full-text along the way, there are only a few 'important'
+   entries.
+
+Example:
+
+  A' - B - C - D - E - F*
+
+  F was requested, A is a fulltext, we only need F. So we can collapse the
+  delta for B,C,D,E,F and just apply it 1 time to A
+
+  A' - B - C - D - E* - F*
+
+  We only need E and F, we don't need to cache A-D
+
+  Build up the delta A-E, apply, save, apply E -> F
+
+  A' - B - C - D - E - F*
+             \
+               G - H - I*
+
+  It is most efficient to build up A-C, cache C, and then build up C-F, and
+  C-I.
+
+  One way to look at it, is that A is a basis for C, and C is a basis for F
+  and I.
+
+
+Data structure:
+
+1) Target revision id (E,F,I)
+2) History to get there
+
+
+Build(F):
+
+ # Map from a revision to a set of what final targets depend on it
+ used_for = {}
+ keep = set(requested_versions)
+ basis = {}
+
+ for version in requested_versions:
+    cursor = version
+
+    while cursor is not None:
+      parent = parent_of(cursor)
+      if cursor in used_for or parent is None:
+        used_for[cursor].add(version)
+        keep.add(cursor)
+        break
+      else:
+        used_for[cursor] = set([version])
+
+      basis[cursor] = parent
+      cursor = parent
+
+ # Now 'used_for' has a mapping from every needed entry towards whatever
+ # target revisions depend on that data.
+ # And 'keep' contains all of the junction points/point of last importance.
+ # keep should be equivalent to all 'used_for' entries that have more than 1
+ # dependency.
+
+ # Now build up the list of important nodes, and how to get to an important
+ # node from another important node
+
+ histories = {}
+
+ for version in keep:
+   history = []
+   
+   cursor = version
+   while cursor is not None:
+     parent = basis[cursor]
+     history.append(parent)
+     if parent in keep:
+       break
+     cursor = parent
+
+   histories[version] = history
+
+ # Extract all of the deltas and fulltexts for all the important pieces in
+ # sorted order
+ # Alternatively, we could do this per history, but we would really like to
+ # start with the first history
+ ordered = sorted_by_position_in_file(keep)
+
+ # Custom reader class that will read in the order the chunks are in disk, but
+ # will return them in requested order.
+ # However, since we are returning a dictionary, we can do our best to make
+ # sure requested order is also disk order. (It is not perfect because of
+ # potential interleaving when requesting multiple versions)
+
+ for version in ordered:
+   contents = self.get_all_contents(histories[version])