[MERGE] New Repository API find_text_key_references for use by reconcile and check.

Robert Collins robertc at robertcollins.net
Sun Nov 18 19:52:29 GMT 2007


On Thu, 2007-11-15 at 17:43 -0500, John Arbash Meinel wrote:
> 
> This check seems like it should be more than just a plain 'assert':
> +        assert self._serializer.support_altered_by_hack, \
> +            ("_find_text_key_references_from_xml_inventory_lines only
> "
> +             "supported for branches which store inventory as
> unnested 
> xml, "
> +             "not on %r" % self)
> 
> Because using it in the wrong situation will cause data loss. Maybe
> just 
> an "if not self._serilaizer....: raise AssertionError()" so that it 
> doesn't get suppressed when running with -O

Yeah, done that. It was just moved code, but all the same - sorted.

> we deserialize every file_id, which for general 'pull' is not great,
> but 
> we don't want to have so many fulltexts that this matters anyway....
> 
> (you can spell it deserialise if you prefer :)

Tweaked similarly.

> 
> v- I just realized that in this function:
> 
> +    def _find_file_ids_from_xml_inventory_lines(self, line_iterator,
> ...

>                   setdefault(file_id, set()).add(revision_id)
>           return result
> 
> ^- we probably don't want to use setdefault(file_id, set())
> because it means we create an empty set() for every pass.
> I would guess that:
> if revision_id in revision_ids:
>    if file_id in result:
>      result[file_id].add(revision_id)
>    else:
>      result[file_id] = set((revision_id,))

IIRC you profiled this previously and found that it was better as it
was; but we can certainly look at changing this. However, as I think
that changing inventory representation asap is important to getting
really good scalable performance, I'm not inclined to invest too much
time tweaking this - and making an untested change would be inadvisable
IMO.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071119/f2bcb44b/attachment.pgp 


More information about the bazaar mailing list