[MERGE] New Repository API find_text_key_references for use by reconcile and check.
John Arbash Meinel
john at arbash-meinel.com
Thu Nov 15 22:43:42 GMT 2007
John Arbash Meinel has voted tweak.
Status is now: Superseded
Comment:
This as marked as superseded because your later patch depends on it, but
I want to review it step by step.
I wonder if we could have a way to say "this builds on that, but allow
me to review it separately". I suppose the other problem is that the
default view is always going to show the aggregate of both patches.
Which means the first change can be moderately easy to review, but you
get them piled together in the second view.
Overall, it looks pretty straightforward. I'm surprised to see a single
test in repository_implementations/test_find_key_references.
I realize that the full test is in test_check_reconcile because that is
double parameterised with "broken scenarios". But it does make it more
difficult to find the tests for the function.
This check seems like it should be more than just a plain 'assert':
+ assert self._serializer.support_altered_by_hack, \
+ ("_find_text_key_references_from_xml_inventory_lines only "
+ "supported for branches which store inventory as unnested
xml, "
+ "not on %r" % self)
Because using it in the wrong situation will cause data loss. Maybe just
an "if not self._serilaizer....: raise AssertionError()" so that it
doesn't get suppressed when running with -O
...
+ # Note that unescaping always means that on a fulltext
cached
+ # inventory we deserialised every fileid, which for general
'pull'
+ # is not great, but we don't really want to have some many
+ # fulltexts that this matters anyway. RBC 20071114.
we deserialize every file_id, which for general 'pull' is not great, but
we don't want to have so many fulltexts that this matters anyway....
(you can spell it deserialise if you prefer :)
v- I just realized that in this function:
+ def _find_file_ids_from_xml_inventory_lines(self, line_iterator,
+ revision_ids):
+ """Helper routine for fileids_altered_by_revision_ids.
+
+ This performs the translation of xml lines to revision ids.
+
+ :param line_iterator: An iterator of lines, origin_version_id
+ :param revision_ids: The revision ids to filter for. This
should be a
+ set or other type which supports efficient __contains__
lookups, as
+ the revision id from each parsed line will be looked up in
the
+ revision_ids filter.
+ :return: a dictionary mapping altered file-ids to an iterable
of
+ revision_ids. Each altered file-ids has the exact revision_ids
that
+ altered it listed explicitly.
+ """
+ result = {}
+ setdefault = result.setdefault
+ for file_id, revision_id in \
+ self._find_text_key_references_from_xml_inventory_lines(
+ line_iterator).iterkeys():
# once data is all ensured-consistent; then this is
# if revision_id == version_id
if revision_id in revision_ids:
- try:
- file_id = unescape_fileid_cache[file_id]
- except KeyError:
- unescaped = unescape(file_id)
- unescape_fileid_cache[file_id] = unescaped
- file_id = unescaped
setdefault(file_id, set()).add(revision_id)
return result
^- we probably don't want to use setdefault(file_id, set())
because it means we create an empty set() for every pass.
I would guess that:
if revision_id in revision_ids:
if file_id in result:
result[file_id].add(revision_id)
else:
result[file_id] = set((revision_id,))
For details, see:
http://bundlebuggy.aaronbentley.com/request/%3C1195014533.19403.4.camel%40lifeless-64%3E
More information about the bazaar
mailing list