[plugin] dirstate experiments
John Arbash Meinel
john at arbash-meinel.com
Thu Jun 15 17:02:52 BST 2006
John Arbash Meinel wrote:
> Robert Collins wrote:
>> On Wed, 2006-06-14 at 16:27 -0400, Aaron Bentley wrote:
...
>
> I agree with the multiple delimiters stuff. I think it will be slightly
> slower, but I really prefer that records are bounded with one marker,
> and individual content is bounded with a different one.
> I actually added a few new fields (like num_entries) in the header so
> that I could detected an incorrect parse, and this is much easier to do
> if you have multiple delimiters.
>
> I'll play around with it and see how much performance we lose by using
> \0 in the inner fields.
>
> John
> =:->
Here is the performance differences. The first pass for \0+\n doesn't
use list comprehensions for the parsing of >0 parents.
0, 1, and 2 parents all have fast-path code since they will be the
common case. 3 parents uses a generic "iterate over the records and
build it up", rather than just fast slicing.
parents \n \0+\n
0 106ms 120ms
1 185 228
2 275 315
3 370 450
So from a simple implementation, we pay quite a bit to split twice. A
lot of this is that I'm not doing list comprehensions. What is also
interesting is that for a smaller number of parents, it is faster to
iterate over an index (for pos in xrange), but for n=3 it is a wash. I
think it is because I have to slice the array to ignore the first couple
of lines, since they have already been processed.
I can't figure out how to create a list comprehension, though. Right now
I'm doing:
entries = []
for pos in xrange(cur, text_len):
fields = text[pos].split('\0')
entries.append((fields[:6], fields[6:12]))
And the problem is the temporary that I need to keep. I could think of:
entries = [
(text[pos].split('\0')[0:6], text[pos].split('\0')[6:12])
for pos in xrange(cur, text_len)
]
That requires double splitting. Which is quite expensive.
I can do it with 2 separate comprehensions, such as:
entries = [
text[pos].split('\0') for pos in xrange(cur, text_len)
]
entries = [
(entry[0:6], entry[6:12])
for entry in entries
]
So one pass to build up the splits, and then another to separate out the
hunks. But that is actually far more expensive.
I wish there was a way of telling the slice operator to break things up
into chunks. Like you can do "x,y,z = list[:3]", but you can't do:
'x,y,z = list[1:3,3:6,6:9]'
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060615/439046e8/attachment.pgp
More information about the bazaar
mailing list