[MERGE] knit._PackAccess

Fri Aug 3 15:47:04 BST 2007

John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
I was looking through the code earlier, and I never saw a good 
definition of
what a 'node' contains. It seemed a little random. And took me a while 
to track
down what was actually being done.

Also, this should have been caught in an earlier review but in the test 
suite
you did:

class TestGraphIndexKnit(KnitTests):
     """Tests for knits using a GraphIndex rather than a KnitIndex."""

     def make_g_index(self, name, ref_lists=0, nodes=[]):
         builder = GraphIndexBuilder(ref_lists)

v- Here you do "(node, references, value)"

         for node, references, value in nodes:
             builder.add_node(node, references, value)
         stream = builder.finish()
         trans = self.get_transport()
         trans.put_file(name, stream)
         return GraphIndex(trans, name)

     def two_graph_index(self, deltas=False, catch_adds=False):
         """Build a two-graph index.

         :param deltas: If true, use underlying indices with two node-ref
             lists and 'parent' set to a delta-compressed against tail.
         """
         # build a complex graph across several indices.
         if deltas:
             # delta compression inn the index
             index1 = self.make_g_index('1', 2, [
                 (('tip', ), 'N0 100', ([('parent', )], [], )),
                 (('tail', ), '', ([], []))])
             index2 = self.make_g_index('2', 2, [
                 (('parent', ), ' 100 78', ([('tail', ), ('ghost', )], 
[('tail', )])),
                 (('separate', ), '', ([], []))])

^- But this looks a lot more like "node, value, references".
Oh, and it might be clearer if you called it "node_id", since it isn't 
the
whole node.

Oh, and in 'bzrlib.tests.test_knit" you do:
from bzrlib.index import *

Which is pretty bad form for our code. Mostly because I saw: 
GraphIndexBuilder
and was trying to figure out where it came from.
But I couldn't find it by searching through the code. And it took me 
quite a
while before I started to think about searching for "from X import *" 
(and if
there were more than 1, it would be a *real* mess)


Overall, I understand why you have 'node' being a tuple of tuples. But 
it isn't
a very pretty interface. It is pretty hard to understand "node[0]" 
versus
"node.node_id".

I'm okay with a tuple interface, as long as it is encapsulated well (so 
we
don't have to do tuple gymnastics in the bulk of our code). I've already 
stated
that I think _iter_changes should really return objects, since by then 
we've
culled the bulk. If we know that we'll having page-based indicies, we 
may also
have already culled the bulk of the data.



Typo here:

@@ -632,9 +633,9 @@
                  if node[0]:
                      continue
                  if self.reference_lists:
-                    yield key, node[2], node[1]
+                    yield self, key, node[2], node[1]
                  else:
-                    yield key, node[2]
+                    yield self ,key, node[2]
                                ^^


v- This line is >79 chars

+    def __init__(self, adapted, prefix, missing_key_length, 
add_nodes_callback=None):
+        """Construct an adapter against adapted with prefix."""
+        self.adapted = adapted
+        self.prefix = prefix + (None,)*missing_key_length
+        self.prefix_key = prefix
+        self.prefix_len = len(prefix)
+        self.add_nodes_callback = add_nodes_callback

^- You call the member 'prefix' but it is actually a full key with the 
missing
fields set to None, and then you set the actual prefix to 'prefix_key'.
If anything a "prefix_key" sounds like a complete key, while "prefix" 
sounds
like just the prefix members.

Also, I would think that since these indicies are meant to be 
write-once, you
wouldn't want an adapter when adding nodes. Maybe I'm just missing what 
you are
trying to do.



+        try:
+            for (key, value, node_refs) in nodes:
+                adjusted_references = (
+                    tuple(tuple(self.prefix_key + ref_node for ref_node 
in ref_list)
+                        for ref_list in node_refs))
+                translated_nodes.append((self.prefix_key + key, value,
+                    adjusted_references))

^- This is pretty opaque as to what you are trying to do.

I think I've sorted it out, but maybe it would be clearer as:

                 # Add prefix_key to each reference
                 # node_refs is a tuple of tuples, so split it apart, and 
add
                 # prefix_key to the internal reference
                 adjusted_references = tuple(tuple(self.prefix_key + 
ref_node
                                                   for ref_node in 
ref_list)
                                             for ref_list in node_refs)
                 translated_nodes.append((self.prefix_key + key, value,
                                          adjusted_references))


Also at this point, you are using a try/except ValueError, I assume to 
catch
when you don't have actual references. Isn't it better to use:

if self.adapted._node_refs:

?



_KnitAccess

+    def create(self):
+        """IFF this data access has its own storage area, initialise 
it.
+
+        :return: None.
+        """
+        self._transport.put_bytes_non_atomic(self._filename, '',
+                                             mode=self._file_mode)
+
+    def open_file(self):
+        """IFF this data access can be represented as a single file, 
open it.
+
+        For knits that are not mapped to a single file on disk this 
will
+        always return None.
+
+        :return: None or a file handle.
+        """
+        try:
+            return self._transport.get(self._filename)
+        except NoSuchFile:
+            pass
+        return None
+


I don't believe that we have any live code that uses 'open_file', and I 
would
rather not use 'create'. I believe _KnitData._open_file() is just relic 
code.

I don't know if you have code that needs it, or if you were just 
maintaining
compatibility. I would rather avoid them if possible.

It might be better to call it _KnitDataAccess, to be clear that you are
accessing the data file, not the .kndx file. (optional)



+        :param writer: A tuple (pack.ContainerWriter, write_index) 
which
+            is contains the pack to write, and the index that reads 
from
+            it will be associated with.
+        """

^ "which is contains" ?
I think you can just remove 'is'

+        The data is spooled to the container writer in one bytes record 
per
+        raw data item.

"writer in one byte-record per raw data item"
or bytes-record, just to make clear it isn't a 'one byte' record.


      def add_record(self, version_id, digest, lines):
-        """Write new text record to disk.  Returns the position in the
-        file where it was written."""
+        """Write new text record to disk.
+
+        Returns index data for retrieving it later, as per 
add_raw_records.
+        """
^- Eventually I'm guessing these apis will need to be updated to include 
stuff
like "parents" so that we can store that as part of the data, rather 
than only
having it in the index. But you don't need to change that yet.


...

+class TestGraphIndexPrefixAdapter(TestCaseWithMemoryTransport):
+
+    def make_index(self, ref_lists=1, key_elements=2, nodes=[], 
add_callback=False):
+        result = InMemoryGraphIndex(ref_lists, 
key_elements=key_elements)
+        result.add_nodes(nodes)
+        if add_callback:
+            add_nodes_callback=result.add_nodes
+        else:
+            add_nodes_callback=None

^- these should have spaces, as they aren't parameters.
             add_nodes_callback = result.add_nodes


I will expect this to conflict a bit with my changes to make extracting 
texts
faster. But not really that bad. We are generally working in different 
(but
nearby) areas, so I don't expect it to be that difficult to resolve.



For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C1186098401.20419.348.camel%40localhost.localdomain%3E