[RFC] ParentsProvider returning a dictionary instead of a list

John Arbash Meinel john at arbash-meinel.com
Tue Dec 11 13:51:14 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>> At the moment our Graph code expects a parents_provider to return a list, where
>> each entry matches up with the requested revisions.
> 
>> DictParentsProvider:
>>   already has them in a dict, so just does the iteration, depending on how
>>   get_parents was defined it could just return it (do we care if there are
>>   extra nodes?)
> 
> It seems wacky to return stuff that the caller wasn't expecting.  If
> you're returning a dict, it would seem natural to iterate through it
> with iteritems, and if un-requested stuff is there, the results could be
> quite silly.
> 
> One of the reasons I originally went with a list was that I wanted to
> make it very clear when a requested set of parents was not being
> returned (e.g. due to not-present revisions).  How do you plan to
> represent that?  Not putting the child revision in the dictionary?
> That's easy to miss.  Assigning the value to None?  That's also easy to
> miss if you're using dict.get().
> 
> Aaron

I went with not present in the dict. Because it makes _StackedParentsProvider
fairly trivial:

parents = {}
needed = set(revisions)
for provider in providers:
  found = provider.get_parent_map(needed)
  parents.update(found)
  needed.difference_update(found)
  if not needed:
    break
return parents

Having a dictionary also means that people who want to:

"if X is not there, give me None" can use "dict.get". For people who want more
direct checking "if X not in dict".

I understand the desire to iterate using iteritems(), however, that won't ever
go in a sorted order (for some uses that is important). And regardless, doing:

for revision, parents in izip(revisions, get_parents(revisions)):

didn't seem better than

parent_map = get_parent_map(revisions)
for revision in revisions:
  parents = parent_map.get(revision)

Especially since 'izip' isn't a builtin, so you have to import it from
itertools. (Which always made me feel it was lesser than 'zip', but I could be
completely wrong on that.)

The interface I defined was strictly:

a) You may return no more than the requested revisions.
b) You should not return an record for missing revisions.

Which means that for loops which were doing:

for revision, parents in izip(revisions, get_parents(revisions)):
  if parents is None:
    continue

you can just do:

for revision, parents in get_parents(revisions).iteritems():

The graph code I was working on does *not* do that. Because it batches up the
requests across all searchers. So it does get_parents() on a larger set.


The other *big* reason to use a dict, is because all of the current users of it
*want* a dict. So they end up doing:

the_map = dict(izip(revisions, get_parents(revisions))

Especially all of the intermediates like StackedParentsProvider and
CachingParentsProvider. Even KnitPackRepository was doing it because it had
multiple "Pack" objects to ask.

Which means that for a stacked pack repository get_parents() call it was doing
3 levels of "build up a dict, flatten it to a list". Which was performing
surprisingly well given all the gyrations.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXpXSJdeBCYSNAAMRAiyeAKCMj/A3t6/x+SmqbtzfZMShdVk5VwCgr+eB
qFFA9d9LfGFxzHz+lkMppNE=
=u4mF
-----END PGP SIGNATURE-----



More information about the bazaar mailing list