[MERGE] knit._PackAccess
John Arbash Meinel
john at arbash-meinel.com
Fri Aug 3 15:47:04 BST 2007
John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
I was looking through the code earlier, and I never saw a good
definition of
what a 'node' contains. It seemed a little random. And took me a while
to track
down what was actually being done.
Also, this should have been caught in an earlier review but in the test
suite
you did:
class TestGraphIndexKnit(KnitTests):
"""Tests for knits using a GraphIndex rather than a KnitIndex."""
def make_g_index(self, name, ref_lists=0, nodes=[]):
builder = GraphIndexBuilder(ref_lists)
v- Here you do "(node, references, value)"
for node, references, value in nodes:
builder.add_node(node, references, value)
stream = builder.finish()
trans = self.get_transport()
trans.put_file(name, stream)
return GraphIndex(trans, name)
def two_graph_index(self, deltas=False, catch_adds=False):
"""Build a two-graph index.
:param deltas: If true, use underlying indices with two node-ref
lists and 'parent' set to a delta-compressed against tail.
"""
# build a complex graph across several indices.
if deltas:
# delta compression inn the index
index1 = self.make_g_index('1', 2, [
(('tip', ), 'N0 100', ([('parent', )], [], )),
(('tail', ), '', ([], []))])
index2 = self.make_g_index('2', 2, [
(('parent', ), ' 100 78', ([('tail', ), ('ghost', )],
[('tail', )])),
(('separate', ), '', ([], []))])
^- But this looks a lot more like "node, value, references".
Oh, and it might be clearer if you called it "node_id", since it isn't
the
whole node.
Oh, and in 'bzrlib.tests.test_knit" you do:
from bzrlib.index import *
Which is pretty bad form for our code. Mostly because I saw:
GraphIndexBuilder
and was trying to figure out where it came from.
But I couldn't find it by searching through the code. And it took me
quite a
while before I started to think about searching for "from X import *"
(and if
there were more than 1, it would be a *real* mess)
Overall, I understand why you have 'node' being a tuple of tuples. But
it isn't
a very pretty interface. It is pretty hard to understand "node[0]"
versus
"node.node_id".
I'm okay with a tuple interface, as long as it is encapsulated well (so
we
don't have to do tuple gymnastics in the bulk of our code). I've already
stated
that I think _iter_changes should really return objects, since by then
we've
culled the bulk. If we know that we'll having page-based indicies, we
may also
have already culled the bulk of the data.
Typo here:
@@ -632,9 +633,9 @@
if node[0]:
continue
if self.reference_lists:
- yield key, node[2], node[1]
+ yield self, key, node[2], node[1]
else:
- yield key, node[2]
+ yield self ,key, node[2]
^^
v- This line is >79 chars
+ def __init__(self, adapted, prefix, missing_key_length,
add_nodes_callback=None):
+ """Construct an adapter against adapted with prefix."""
+ self.adapted = adapted
+ self.prefix = prefix + (None,)*missing_key_length
+ self.prefix_key = prefix
+ self.prefix_len = len(prefix)
+ self.add_nodes_callback = add_nodes_callback
^- You call the member 'prefix' but it is actually a full key with the
missing
fields set to None, and then you set the actual prefix to 'prefix_key'.
If anything a "prefix_key" sounds like a complete key, while "prefix"
sounds
like just the prefix members.
Also, I would think that since these indicies are meant to be
write-once, you
wouldn't want an adapter when adding nodes. Maybe I'm just missing what
you are
trying to do.
+ try:
+ for (key, value, node_refs) in nodes:
+ adjusted_references = (
+ tuple(tuple(self.prefix_key + ref_node for ref_node
in ref_list)
+ for ref_list in node_refs))
+ translated_nodes.append((self.prefix_key + key, value,
+ adjusted_references))
^- This is pretty opaque as to what you are trying to do.
I think I've sorted it out, but maybe it would be clearer as:
# Add prefix_key to each reference
# node_refs is a tuple of tuples, so split it apart, and
add
# prefix_key to the internal reference
adjusted_references = tuple(tuple(self.prefix_key +
ref_node
for ref_node in
ref_list)
for ref_list in node_refs)
translated_nodes.append((self.prefix_key + key, value,
adjusted_references))
Also at this point, you are using a try/except ValueError, I assume to
catch
when you don't have actual references. Isn't it better to use:
if self.adapted._node_refs:
?
_KnitAccess
+ def create(self):
+ """IFF this data access has its own storage area, initialise
it.
+
+ :return: None.
+ """
+ self._transport.put_bytes_non_atomic(self._filename, '',
+ mode=self._file_mode)
+
+ def open_file(self):
+ """IFF this data access can be represented as a single file,
open it.
+
+ For knits that are not mapped to a single file on disk this
will
+ always return None.
+
+ :return: None or a file handle.
+ """
+ try:
+ return self._transport.get(self._filename)
+ except NoSuchFile:
+ pass
+ return None
+
I don't believe that we have any live code that uses 'open_file', and I
would
rather not use 'create'. I believe _KnitData._open_file() is just relic
code.
I don't know if you have code that needs it, or if you were just
maintaining
compatibility. I would rather avoid them if possible.
It might be better to call it _KnitDataAccess, to be clear that you are
accessing the data file, not the .kndx file. (optional)
+ :param writer: A tuple (pack.ContainerWriter, write_index)
which
+ is contains the pack to write, and the index that reads
from
+ it will be associated with.
+ """
^ "which is contains" ?
I think you can just remove 'is'
+ The data is spooled to the container writer in one bytes record
per
+ raw data item.
"writer in one byte-record per raw data item"
or bytes-record, just to make clear it isn't a 'one byte' record.
def add_record(self, version_id, digest, lines):
- """Write new text record to disk. Returns the position in the
- file where it was written."""
+ """Write new text record to disk.
+
+ Returns index data for retrieving it later, as per
add_raw_records.
+ """
^- Eventually I'm guessing these apis will need to be updated to include
stuff
like "parents" so that we can store that as part of the data, rather
than only
having it in the index. But you don't need to change that yet.
...
+class TestGraphIndexPrefixAdapter(TestCaseWithMemoryTransport):
+
+ def make_index(self, ref_lists=1, key_elements=2, nodes=[],
add_callback=False):
+ result = InMemoryGraphIndex(ref_lists,
key_elements=key_elements)
+ result.add_nodes(nodes)
+ if add_callback:
+ add_nodes_callback=result.add_nodes
+ else:
+ add_nodes_callback=None
^- these should have spaces, as they aren't parameters.
add_nodes_callback = result.add_nodes
I will expect this to conflict a bit with my changes to make extracting
texts
faster. But not really that bad. We are generally working in different
(but
nearby) areas, so I don't expect it to be that difficult to resolve.
For details, see:
http://bundlebuggy.aaronbentley.com/request/%3C1186098401.20419.348.camel%40localhost.localdomain%3E
More information about the bazaar
mailing list