[RFC] ParentsProvider returning a dictionary instead of a list
John Arbash Meinel
john at arbash-meinel.com
Tue Dec 11 13:51:14 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> John Arbash Meinel wrote:
>> At the moment our Graph code expects a parents_provider to return a list, where
>> each entry matches up with the requested revisions.
>
>> DictParentsProvider:
>> already has them in a dict, so just does the iteration, depending on how
>> get_parents was defined it could just return it (do we care if there are
>> extra nodes?)
>
> It seems wacky to return stuff that the caller wasn't expecting. If
> you're returning a dict, it would seem natural to iterate through it
> with iteritems, and if un-requested stuff is there, the results could be
> quite silly.
>
> One of the reasons I originally went with a list was that I wanted to
> make it very clear when a requested set of parents was not being
> returned (e.g. due to not-present revisions). How do you plan to
> represent that? Not putting the child revision in the dictionary?
> That's easy to miss. Assigning the value to None? That's also easy to
> miss if you're using dict.get().
>
> Aaron
I went with not present in the dict. Because it makes _StackedParentsProvider
fairly trivial:
parents = {}
needed = set(revisions)
for provider in providers:
found = provider.get_parent_map(needed)
parents.update(found)
needed.difference_update(found)
if not needed:
break
return parents
Having a dictionary also means that people who want to:
"if X is not there, give me None" can use "dict.get". For people who want more
direct checking "if X not in dict".
I understand the desire to iterate using iteritems(), however, that won't ever
go in a sorted order (for some uses that is important). And regardless, doing:
for revision, parents in izip(revisions, get_parents(revisions)):
didn't seem better than
parent_map = get_parent_map(revisions)
for revision in revisions:
parents = parent_map.get(revision)
Especially since 'izip' isn't a builtin, so you have to import it from
itertools. (Which always made me feel it was lesser than 'zip', but I could be
completely wrong on that.)
The interface I defined was strictly:
a) You may return no more than the requested revisions.
b) You should not return an record for missing revisions.
Which means that for loops which were doing:
for revision, parents in izip(revisions, get_parents(revisions)):
if parents is None:
continue
you can just do:
for revision, parents in get_parents(revisions).iteritems():
The graph code I was working on does *not* do that. Because it batches up the
requests across all searchers. So it does get_parents() on a larger set.
The other *big* reason to use a dict, is because all of the current users of it
*want* a dict. So they end up doing:
the_map = dict(izip(revisions, get_parents(revisions))
Especially all of the intermediates like StackedParentsProvider and
CachingParentsProvider. Even KnitPackRepository was doing it because it had
multiple "Pack" objects to ask.
Which means that for a stacked pack repository get_parents() call it was doing
3 levels of "build up a dict, flatten it to a list". Which was performing
surprisingly well given all the gyrations.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHXpXSJdeBCYSNAAMRAiyeAKCMj/A3t6/x+SmqbtzfZMShdVk5VwCgr+eB
qFFA9d9LfGFxzHz+lkMppNE=
=u4mF
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list