[MERGE] Multiple element keys.

Fri Jul 27 18:15:01 BST 2007

John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
This portion looks a little funny to me.
It may be sufficient to add a comment, but I'm not quite sure what it is 
doing.

+        if self._key_length > 1:
+            key_dict = self._nodes_by_key
+            if self.reference_lists:
+                key_value = key, value, tuple(node_refs)
+            else:
+                key_value = key, value
+            # possibly should do this on-demand, but it seems likely it 
is
+            # always wanted
+            subkey = list(reversed(key[:-1]))
+            while len(subkey):
+                if subkey[-1] not in key_dict:
+                    key_dict[subkey[-1]] = {}
+                key_dict = key_dict[subkey[-1]]
+                subkey.pop(-1)
+            key_dict[key[-1]] = key_value

It seems that this is trying to take a key like:

  (foo, bar, baz) => value [node_refs]

And add:

  self._nodes_by_key[(foo, bar)][baz] = ((foo, bar, baz), value, [node 
refs])

But I'm trying to figure out what it is doing for 
"self._nodes_by_key[(foo,)]".

subkey = list(reversed(key[:-1]))

will give you a key of [bar, foo].
And then you look for the key "foo" to exist in self._nodes_by_key, and 
if not,
you create a node for it which is the empty dict.
then you pop up....

Ok, so I think what I'm seeing is that you have:

self._nodes_by_key[foo] = {}
self._nodes_by_key[foo][bar] = {}
self._nodes_by_key[foo][bar][baz] = ((foo,bar,baz), value, [node refs])

I think it is safe, though there seems to be some oddity in having
self._nodes_by_key have both (foo, bar, baz) which point to a tuple, and 
foo,
which points to a dict of a dict, until you get to another tuple
It *might* be better to use a different dict.

Oh, and I would suggest:

# For a key of (foo, bar, baz) create
# _nodes_by_key[foo][bar][baz] = key_value
for subkey in key[:-1]:
     key_dict = key_dict.setdefault(subkey, {})
key_dict[key[-1]] = key_value

-        prefix_length = len(lines[0]) + len(lines[1])
+        lines.append(_OPTION_KEY_ELEMENTS + str(self._key_length) + 
'\n')
+        prefix_length = len(lines[0]) + len(lines[1]) + len(lines[2])
^- Is this better:
   prefix_length = sum(len(x) for x in lines)

-            lines.append("%s\0%s\0%s\0%s\n" % (key, absent,
+            string_key = '\x00'.join(key)
+            lines.append("%s\0%s\0%s\0%s\n" % (string_key, absent,
                  '\t'.join(flattened_references), value))
^- You seem to be switching between '\x00' and '\0' for no obvious 
benefit.
And it is a little odd to use the same delimiter for the key as for the
columns, but since you have fixed size keys (defined by the prelude), it 
should
be ok.

-            key, absent, references, value = line.split('\0')
+            elements = line.split('\0')
+            # keys are tuples
+            key = tuple(elements[:self._key_length])
+            absent, references, value = elements[-3:]

^- I wouldn't mind having a check here that the number of elements 
matches what
we expect. This will error if you have too few, but not if you have too 
many.

v- This code path could use some comments
+            if len(elements):

v- You know how many depths you need to go, by how many elements have 
not been
consumed (how many are None).
I think it would be better to make use of that, then to use 
isinstance/type ==
dict.

+                dicts = [key_dict]
+                while dicts:
+                    key_dict = dicts.pop(-1)
+                    # can't be empty or would not exist
+                    item, value = key_dict.iteritems().next()
+                    if type(value) == dict:
+                        # push keys
+                        dicts.extend(key_dict.itervalues())
+                    else:
+                        # yield keys
+                        for value in key_dict.itervalues():

v- At the least, a comment here that we know 'value' is the correct 
tuple.
Otherwise it looks like above we are yielding multiple values, but here 
we are
yeilding only one. (It takes closer inspection to realize that this is a 
tuple).
+                            yield value
+            else:

v- #  key_dict actually points to the final tuple, so just yield it
+                yield key_dict

+        # XXX: To much duplication with the GraphIndex class; consider 
finding
+        # a good place to pull out the actual common logic.

^- I was about to point it out, before I saw this. Isn't this a straight 
copy
of the other code?

+    def _keys_to_version_ids(self, keys):
+        return tuple(key[0] for key in keys)

^- Why is this returning a tuple rather than a list?

return [key[0] for key in keys]

Seems a much more natural fit.

Things seem properly tested, etc.

For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C1185545840.27797.149.camel%40localhost.localdomain%3E