[MERGE] New Repository API find_text_key_references for use by reconcile and check.

Thu Nov 15 22:43:42 GMT 2007

John Arbash Meinel has voted tweak.
Status is now: Superseded
Comment:
This as marked as superseded because your later patch depends on it, but 
I want to review it step by step.

I wonder if we could have a way to say "this builds on that, but allow 
me to review it separately". I suppose the other problem is that the 
default view is always going to show the aggregate of both patches. 
Which means the first change can be moderately easy to review, but you 
get them piled together in the second view.

Overall, it looks pretty straightforward. I'm surprised to see a single 
test in repository_implementations/test_find_key_references.
I realize that the full test is in test_check_reconcile because that is 
double parameterised with "broken scenarios". But it does make it more 
difficult to find the tests for the function.

This check seems like it should be more than just a plain 'assert':
+        assert self._serializer.support_altered_by_hack, \
+            ("_find_text_key_references_from_xml_inventory_lines only "
+             "supported for branches which store inventory as unnested 
xml, "
+             "not on %r" % self)

Because using it in the wrong situation will cause data loss. Maybe just 
an "if not self._serilaizer....: raise AssertionError()" so that it 
doesn't get suppressed when running with -O

...
+            # Note that unescaping always means that on a fulltext 
cached
+            # inventory we deserialised every fileid, which for general 
'pull'
+            # is not great, but we don't really want to have some many
+            # fulltexts that this matters anyway. RBC 20071114.

we deserialize every file_id, which for general 'pull' is not great, but 
we don't want to have so many fulltexts that this matters anyway....

(you can spell it deserialise if you prefer :)

v- I just realized that in this function:

+    def _find_file_ids_from_xml_inventory_lines(self, line_iterator,
+        revision_ids):
+        """Helper routine for fileids_altered_by_revision_ids.
+
+        This performs the translation of xml lines to revision ids.
+
+        :param line_iterator: An iterator of lines, origin_version_id
+        :param revision_ids: The revision ids to filter for. This 
should be a
+            set or other type which supports efficient __contains__ 
lookups, as
+            the revision id from each parsed line will be looked up in 
the
+            revision_ids filter.
+        :return: a dictionary mapping altered file-ids to an iterable 
of
+        revision_ids. Each altered file-ids has the exact revision_ids 
that
+        altered it listed explicitly.
+        """
+        result = {}
+        setdefault = result.setdefault
+        for file_id, revision_id in \
+            self._find_text_key_references_from_xml_inventory_lines(
+                line_iterator).iterkeys():
              # once data is all ensured-consistent; then this is
              # if revision_id == version_id
              if revision_id in revision_ids:
-                try:
-                    file_id = unescape_fileid_cache[file_id]
-                except KeyError:
-                    unescaped = unescape(file_id)
-                    unescape_fileid_cache[file_id] = unescaped
-                    file_id = unescaped
                  setdefault(file_id, set()).add(revision_id)
          return result

^- we probably don't want to use setdefault(file_id, set())
because it means we create an empty set() for every pass.
I would guess that:
if revision_id in revision_ids:
   if file_id in result:
     result[file_id].add(revision_id)
   else:
     result[file_id] = set((revision_id,))

For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C1195014533.19403.4.camel%40lifeless-64%3E