[plugin] dirstate experiments
Wouter van Heyst
larstiq at larstiq.dyndns.org
Thu Jun 15 17:44:46 BST 2006
On Thu, Jun 15, 2006 at 11:02:52AM -0500, John Arbash Meinel wrote:
> John Arbash Meinel wrote:
> > Robert Collins wrote:
> >> On Wed, 2006-06-14 at 16:27 -0400, Aaron Bentley wrote:
>
> ...
>
> >
> > I agree with the multiple delimiters stuff. I think it will be slightly
> > slower, but I really prefer that records are bounded with one marker,
> > and individual content is bounded with a different one.
> > I actually added a few new fields (like num_entries) in the header so
> > that I could detected an incorrect parse, and this is much easier to do
> > if you have multiple delimiters.
> >
> > I'll play around with it and see how much performance we lose by using
> > \0 in the inner fields.
> >
> > John
> > =:->
>
> Here is the performance differences. The first pass for \0+\n doesn't
> use list comprehensions for the parsing of >0 parents.
>
> 0, 1, and 2 parents all have fast-path code since they will be the
> common case. 3 parents uses a generic "iterate over the records and
> build it up", rather than just fast slicing.
>
> parents \n \0+\n
> 0 106ms 120ms
> 1 185 228
> 2 275 315
> 3 370 450
>
> So from a simple implementation, we pay quite a bit to split twice. A
> lot of this is that I'm not doing list comprehensions. What is also
> interesting is that for a smaller number of parents, it is faster to
> iterate over an index (for pos in xrange), but for n=3 it is a wash. I
> think it is because I have to slice the array to ignore the first couple
> of lines, since they have already been processed.
>
> I can't figure out how to create a list comprehension, though. Right now
> I'm doing:
> entries = []
> for pos in xrange(cur, text_len):
> fields = text[pos].split('\0')
> entries.append((fields[:6], fields[6:12]))
>
> And the problem is the temporary that I need to keep. I could think of:
>
> entries = [
> (text[pos].split('\0')[0:6], text[pos].split('\0')[6:12])
> for pos in xrange(cur, text_len)
> ]
>
> That requires double splitting. Which is quite expensive.
> I can do it with 2 separate comprehensions, such as:
> entries = [
> text[pos].split('\0') for pos in xrange(cur, text_len)
> ]
> entries = [
> (entry[0:6], entry[6:12])
> for entry in entries
> ]
>
> So one pass to build up the splits, and then another to separate out the
> hunks. But that is actually far more expensive.
>
> I wish there was a way of telling the slice operator to break things up
> into chunks. Like you can do "x,y,z = list[:3]", but you can't do:
> 'x,y,z = list[1:3,3:6,6:9]'
lst = list(xrange(9))
x, y, z = [lst[i:i+3] for i in xrange(0, 7, 3)]
but that might not be very fast.
Wouter van Heyst
More information about the bazaar
mailing list