[RFC] ParentsProvider returning a dictionary instead of a list

John Arbash Meinel john at arbash-meinel.com
Tue Dec 11 16:24:34 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>> Aaron Bentley wrote:
>>> One of the reasons I originally went with a list was that I wanted to
>>> make it very clear when a requested set of parents was not being
>>> returned (e.g. due to not-present revisions).  How do you plan to
>>> represent that?  Not putting the child revision in the dictionary?
>>> That's easy to miss.  Assigning the value to None?  That's also easy to
>>> miss if you're using dict.get().
> 
>> I went with not present in the dict.
> 
> I would like to avoid making this API tricky to use, and I think
> silently returning fewer values than expected makes it tricky.
> 
>> Because it makes _StackedParentsProvider
>> fairly trivial:
> 
>> parents = {}
>> needed = set(revisions)
>> for provider in providers:
>>   found = provider.get_parent_map(needed)
>>   parents.update(found)
>>   needed.difference_update(found)
>>   if not needed:
>>     break
>> return parents
> 
> Oh, but using None would also be trivial:
> 
> parents = dict((r, None) for r in revisions)
> for provider in providers:
>     needed = set(r for r, v in parents.iteritems() if r is None)
>     if needed == set():
>         break
>     parents.update(parentsprovider.get_parent_map(needed))
> 
> And the same algorithm would work if we used a reserved id like
> "missing:" instead of None.

True, but it requires implementations to pass over the data with an if check.

Also, your above code is iterating the *whole* requested set each time. Rather
than just the subset of nodes that were requested. I suppose you could do:

needed = set(revisions)
for provider in providers:
  if not needed:
    break
  new_nodes = provider.get_parent_map(needed)
  needed.difference_update(r for r, v in parents.iteritems() if v is None)

I do remember the original code:
                new_search_revisions.update(p for p in parents if
                                            p not in self.seen)

Which had "<generator expression>" show surprisingly prominently in the lsprof
notes.

Now, that was a "not in set()" call, which is probably more expensive than "v
is None".

I could probably be persuaded to return None. I don't find it strange to return
less than requested if that is the defined api. It is also the api definition
of CombinedIndex.iter_entries(). (Return what you have, return nothing for misses.)


> 
>> Having a dictionary also means that people who want to:
> 
> I'm not opposed to having a dictionary, I just want to make the
> interface as direct and unsurprising as possible.
> 
>> "if X is not there, give me None" can use "dict.get". For people who want more
>> direct checking "if X not in dict".
> 
>> I understand the desire to iterate using iteritems(), however, that won't ever
>> go in a sorted order (for some uses that is important). And regardless, doing:
> 
>> for revision, parents in izip(revisions, get_parents(revisions)):
> 
>> didn't seem better than
> 
>> parent_map = get_parent_map(revisions)
>> for revision in revisions:
>>   parents = parent_map.get(revision)
> 
> Well, there is a difference to me in that the first form will tell us
> that some data was missing, and the second form just skips it.
> 
>> The graph code I was working on does *not* do that. Because it batches up the
>> requests across all searchers. So it does get_parents() on a larger set.
> 
> 
>> The other *big* reason to use a dict, is because all of the current users of it
>> *want* a dict. So they end up doing:
> 
> I repeat, I am not opposing a dict interface.  I'm merely trying to
> avoid a surprising interface.
> 
> Aaron

My feeling was that if you *want* None for missing keys, you can use dict.get.
If you wanted to test for presence, then you can't do "r in dict" if we always
put all keys in there. You have to do x = dict[r]; if x is None:...

But yes, I could be persuaded to go either way. And if you genuinely feel that
putting None in is better than not, we can do it. I felt like having to look at
the values to see if that node was really there was worse than just looking at
the keys. At least, there was less indirection.

I suppose we should probably have Martin or Andrew speak up as a tie-breaker.

In the mean-time, I'll work on posting 2 clean patches. One which just has my
loop changes, and uses an adapter on Graph to create a dict, and another which
implements get_parent_map() which we can probably trivially change based on
whether we want to return key:None or no key for missing entries.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXrnCJdeBCYSNAAMRAlmZAKDVY414hJeaDBdxNFt3FqW9nWwftwCcDDYZ
E8Uw+RznKHjD/6fxmCu1wOg=
=NzkC
-----END PGP SIGNATURE-----



More information about the bazaar mailing list