[plugin] dirstate experiments

John Arbash Meinel john at arbash-meinel.com
Thu Jun 15 17:02:52 BST 2006


John Arbash Meinel wrote:
> Robert Collins wrote:
>> On Wed, 2006-06-14 at 16:27 -0400, Aaron Bentley wrote:

...

> 
> I agree with the multiple delimiters stuff. I think it will be slightly
> slower, but I really prefer that records are bounded with one marker,
> and individual content is bounded with a different one.
> I actually added a few new fields (like num_entries) in the header so
> that I could detected an incorrect parse, and this is much easier to do
> if you have multiple delimiters.
> 
> I'll play around with it and see how much performance we lose by using
> \0 in the inner fields.
> 
> John
> =:->

Here is the performance differences. The first pass for \0+\n doesn't
use list comprehensions for the parsing of >0 parents.

0, 1, and 2 parents all have fast-path code since they will be the
common case. 3 parents uses a generic "iterate over the records and
build it up", rather than just fast slicing.

parents	\n	\0+\n
0	106ms	120ms
1	185	228
2	275	315
3	370	450

So from a simple implementation, we pay quite a bit to split twice. A
lot of this is that I'm not doing list comprehensions. What is also
interesting is that for a smaller number of parents, it is faster to
iterate over an index (for pos in xrange), but for n=3 it is a wash. I
think it is because I have to slice the array to ignore the first couple
of lines, since they have already been processed.

I can't figure out how to create a list comprehension, though. Right now
I'm doing:
entries = []
for pos in xrange(cur, text_len):
    fields = text[pos].split('\0')
    entries.append((fields[:6], fields[6:12]))

And the problem is the temporary that I need to keep. I could think of:

entries = [
  (text[pos].split('\0')[0:6], text[pos].split('\0')[6:12])
  for pos in xrange(cur, text_len)
]

That requires double splitting. Which is quite expensive.
I can do it with 2 separate comprehensions, such as:
entries = [
    text[pos].split('\0') for pos in xrange(cur, text_len)
]
entries = [
    (entry[0:6], entry[6:12])
    for entry in entries
]

So one pass to build up the splits, and then another to separate out the
hunks. But that is actually far more expensive.

I wish there was a way of telling the slice operator to break things up
into chunks. Like you can do "x,y,z = list[:3]", but you can't do:
'x,y,z = list[1:3,3:6,6:9]'

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060615/439046e8/attachment.pgp 


More information about the bazaar mailing list