Single-keyspace APIs (was Re: RFC: versionedfile overhaul)

Wed Mar 19 03:39:59 GMT 2008

Robert Collins wrote:
> On Mon, 2008-03-17 at 08:14 -0400, Aaron Bentley wrote:
>> If we don't have a single keyspace, we will have, in the worst case, 4x
>> as many roundtrips as are necessary to perform the operations.
> 
> We don't have a single keyspace in our indices. Assuming one readv per
> pack per datatype, we spend many more roundtrips figuring out where to
> read data from the pack; because indices require traversing a tree based
> on the key.

Fair enough.  I still think we should be aiming to minimize our
roundtrips.  I can see why four roundtrips are necessary to access four
index files, but I don't see any necessity in performing four roundtrips
to access one pack file.

> Combining all the keys into one index makes the index bigger, and as we
> have different data we want attached to each key will make the index
> logic more complex and the indices themselves bigger, with more data
> leading to more round trips during index queries. I'm quite convinced we
> would hurt performance doing this at all naively.

I'm not sure whether you're talking about the physical representation,
but I'm not suggesting changing that at the moment.  However, if we were
to combine the data into a single index file, there would also be
possible performance wins, so it's worth considering at some point.

>> A unified keyspace would mean that every repository record could carry
>> its unique name.  This would move towards our goal of making indices an
>> optimization only.
> 
> I don't want to make indices an optimisation only; what I want to do is
> to make sure they are completely regeneratable from a .pack. This is IMO
> quite different: I would expect that given an indexless .pack we would
> scan and generate indices before doing any other operations.

I think they're pretty much the same thing, but if you want to phrase it
as "completely regeneratable from a .pack", that's fine with me.

>> So I think the time for a unified keyspace is now.
> 
> The tuple based keys we've agreed to use are easily (no api changes
> related to keys) extended to use a datatype key prefix.

I don't understand what you mean.  AFAICT, a unified keyspace would
require a new API.  How could it be otherwise?

> So I don't think
> there would be much rework to go the full hog here in future if we just
> go to tuples today.

I'd still rather go full hog now.

Aaron