[plugin] dirstate experiments

Wouter van Heyst larstiq at larstiq.dyndns.org
Thu Jun 15 17:44:46 BST 2006


On Thu, Jun 15, 2006 at 11:02:52AM -0500, John Arbash Meinel wrote:
> John Arbash Meinel wrote:
> > Robert Collins wrote:
> >> On Wed, 2006-06-14 at 16:27 -0400, Aaron Bentley wrote:
> 
> ...
> 
> > 
> > I agree with the multiple delimiters stuff. I think it will be slightly
> > slower, but I really prefer that records are bounded with one marker,
> > and individual content is bounded with a different one.
> > I actually added a few new fields (like num_entries) in the header so
> > that I could detected an incorrect parse, and this is much easier to do
> > if you have multiple delimiters.
> > 
> > I'll play around with it and see how much performance we lose by using
> > \0 in the inner fields.
> > 
> > John
> > =:->
> 
> Here is the performance differences. The first pass for \0+\n doesn't
> use list comprehensions for the parsing of >0 parents.
> 
> 0, 1, and 2 parents all have fast-path code since they will be the
> common case. 3 parents uses a generic "iterate over the records and
> build it up", rather than just fast slicing.
> 
> parents	\n	\0+\n
> 0	106ms	120ms
> 1	185	228
> 2	275	315
> 3	370	450
> 
> So from a simple implementation, we pay quite a bit to split twice. A
> lot of this is that I'm not doing list comprehensions. What is also
> interesting is that for a smaller number of parents, it is faster to
> iterate over an index (for pos in xrange), but for n=3 it is a wash. I
> think it is because I have to slice the array to ignore the first couple
> of lines, since they have already been processed.
> 
> I can't figure out how to create a list comprehension, though. Right now
> I'm doing:
> entries = []
> for pos in xrange(cur, text_len):
>     fields = text[pos].split('\0')
>     entries.append((fields[:6], fields[6:12]))
> 
> And the problem is the temporary that I need to keep. I could think of:
> 
> entries = [
>   (text[pos].split('\0')[0:6], text[pos].split('\0')[6:12])
>   for pos in xrange(cur, text_len)
> ]
> 
> That requires double splitting. Which is quite expensive.
> I can do it with 2 separate comprehensions, such as:
> entries = [
>     text[pos].split('\0') for pos in xrange(cur, text_len)
> ]
> entries = [
>     (entry[0:6], entry[6:12])
>     for entry in entries
> ]
> 
> So one pass to build up the splits, and then another to separate out the
> hunks. But that is actually far more expensive.
> 
> I wish there was a way of telling the slice operator to break things up
> into chunks. Like you can do "x,y,z = list[:3]", but you can't do:
> 'x,y,z = list[1:3,3:6,6:9]'

lst = list(xrange(9))
x, y, z = [lst[i:i+3] for i in xrange(0, 7, 3)]

but that might not be very fast.

Wouter van Heyst




More information about the bazaar mailing list