Rev 2245: Adding a simple doc describing what I want to do in http://bzr.arbash-meinel.com/branches/bzr/0.15-dev/list_patch
John Arbash Meinel
john at arbash-meinel.com
Sat Jan 27 13:42:53 GMT 2007
------------------------------------------------------------
revno: 2245
revision-id: john at arbash-meinel.com-20070127134240-nae88rt1lsm658qn
parent: pqm at pqm.ubuntu.com-20070125051817-53a80525dbdf87b4
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: list_patch
timestamp: Sat 2007-01-27 07:42:40 -0600
message:
Adding a simple doc describing what I want to do
added:
design_ideas.txt design_ideas.txt-20070127134227-65ul2uetpxrem4v8-1
-------------- next part --------------
=== added file 'design_ideas.txt'
--- a/design_ideas.txt 1970-01-01 00:00:00 +0000
+++ b/design_ideas.txt 2007-01-27 13:42:40 +0000
@@ -0,0 +1,101 @@
+
+# Outline of workflow:
+
+1) Figure out all pieces of the .knit that needs to be read
+2) Sort by position in the file
+3) read data in-order, and build up content information for the requested
+ records
+4) build up content 'lazily' to avoid lots of manipulation on large lists
+5) don't cache every full-text along the way, there are only a few 'important'
+ entries.
+
+Example:
+
+ A' - B - C - D - E - F*
+
+ F was requested, A is a fulltext, we only need F. So we can collapse the
+ delta for B,C,D,E,F and just apply it 1 time to A
+
+ A' - B - C - D - E* - F*
+
+ We only need E and F, we don't need to cache A-D
+
+ Build up the delta A-E, apply, save, apply E -> F
+
+ A' - B - C - D - E - F*
+ \
+ G - H - I*
+
+ It is most efficient to build up A-C, cache C, and then build up C-F, and
+ C-I.
+
+ One way to look at it, is that A is a basis for C, and C is a basis for F
+ and I.
+
+
+Data structure:
+
+1) Target revision id (E,F,I)
+2) History to get there
+
+
+Build(F):
+
+ # Map from a revision to a set of what final targets depend on it
+ used_for = {}
+ keep = set(requested_versions)
+ basis = {}
+
+ for version in requested_versions:
+ cursor = version
+
+ while cursor is not None:
+ parent = parent_of(cursor)
+ if cursor in used_for or parent is None:
+ used_for[cursor].add(version)
+ keep.add(cursor)
+ break
+ else:
+ used_for[cursor] = set([version])
+
+ basis[cursor] = parent
+ cursor = parent
+
+ # Now 'used_for' has a mapping from every needed entry towards whatever
+ # target revisions depend on that data.
+ # And 'keep' contains all of the junction points/point of last importance.
+ # keep should be equivalent to all 'used_for' entries that have more than 1
+ # dependency.
+
+ # Now build up the list of important nodes, and how to get to an important
+ # node from another important node
+
+ histories = {}
+
+ for version in keep:
+ history = []
+
+ cursor = version
+ while cursor is not None:
+ parent = basis[cursor]
+ history.append(parent)
+ if parent in keep:
+ break
+ cursor = parent
+
+ histories[version] = history
+
+ # Extract all of the deltas and fulltexts for all the important pieces in
+ # sorted order
+ # Alternatively, we could do this per history, but we would really like to
+ # start with the first history
+ ordered = sorted_by_position_in_file(keep)
+
+ # Custom reader class that will read in the order the chunks are in disk, but
+ # will return them in requested order.
+ # However, since we are returning a dictionary, we can do our best to make
+ # sure requested order is also disk order. (It is not perfect because of
+ # potential interleaving when requesting multiple versions)
+
+ for version in ordered:
+ contents = self.get_all_contents(histories[version])
More information about the bazaar-commits
mailing list