Rev 3688: (jam) Streamline BTreeBuilder.add_node et al to make btree creation in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Wed Sep 3 23:31:06 BST 2008
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 3688
revision-id: pqm at pqm.ubuntu.com-20080903223056-b108iytb38xkznci
parent: pqm at pqm.ubuntu.com-20080903220002-hit46ycge6gqsr8l
parent: john at arbash-meinel.com-20080903213644-icayfa0cn3hq3skv
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2008-09-03 23:30:56 +0100
message:
(jam) Streamline BTreeBuilder.add_node et al to make btree creation
faster.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 3644.2.13
revision-id: john at arbash-meinel.com-20080903213644-icayfa0cn3hq3skv
parent: john at arbash-meinel.com-20080828201920-fcrvnykg1jd4rwnx
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Wed 2008-09-03 16:36:44 -0500
message:
NEWS entry about the performance improvements.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
------------------------------------------------------------
revno: 3644.2.12
revision-id: john at arbash-meinel.com-20080828201920-fcrvnykg1jd4rwnx
parent: john at arbash-meinel.com-20080828201644-itfv9mmesr50lncd
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Thu 2008-08-28 15:19:20 -0500
message:
Remove an incorrect comment.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 3644.2.11
revision-id: john at arbash-meinel.com-20080828201644-itfv9mmesr50lncd
parent: john at arbash-meinel.com-20080828201331-dqffxf54l2heokll
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Thu 2008-08-28 15:16:44 -0500
message:
Document the new form of _nodes and remove an unnecessary cast.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 3644.2.10
revision-id: john at arbash-meinel.com-20080828201331-dqffxf54l2heokll
parent: john at arbash-meinel.com-20080828200552-sw5lzds9mmi3qnnb
parent: pqm at pqm.ubuntu.com-20080828171745-xdrmccm17muk77y0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Thu 2008-08-28 15:13:31 -0500
message:
Merge bzr.dev 3658
added:
bzrlib/transport/ftp/ ftp-20080611185801-3vm145h8dmnfgh25-1
bzrlib/transport/ftp/_gssapi.py _gssapi.py-20080611190840-7ejrtp884bk5eu72-2
renamed:
bzrlib/transport/ftp.py => bzrlib/transport/ftp/__init__.py ftp.py-20051116161804-58dc9506548c2a53
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/_patiencediff_c.c _patiencediff_c.c-20070721205602-q3imkipwlgagp3cy-1
bzrlib/branch.py branch.py-20050309040759-e4baf4e0d046576e
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
bzrlib/builtins.py builtins.py-20050830033751-fc01482b9ca23183
bzrlib/chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
bzrlib/config.py config.py-20051011043216-070c74f4e9e338e8
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
bzrlib/mail_client.py mail_client.py-20070809192806-vuxt3t19srtpjpdn-1
bzrlib/merge.py merge.py-20050513021216-953b65a438527106
bzrlib/plugin.py plugin.py-20050622060424-829b654519533d69
bzrlib/tests/blackbox/test_merge.py test_merge.py-20060323225809-9bc0459c19917f41
bzrlib/tests/blackbox/test_missing.py test_missing.py-20051211212735-a2cf4c1840bb84c4
bzrlib/tests/blackbox/test_non_ascii.py test_non_ascii.py-20060105214030-68010be784a5d854
bzrlib/tests/blackbox/test_push.py test_push.py-20060329002750-929af230d5d22663
bzrlib/tests/blackbox/test_send.py test_bundle.py-20060616222707-c21c8b7ea5ef57b1
bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
bzrlib/tests/test_chunk_writer.py test_chunk_writer.py-20080630234519-6ggn4id17nipovny-2
bzrlib/tests/test_diff.py testdiff.py-20050727164403-d1a3496ebb12e339
bzrlib/tests/test_log.py testlog.py-20050728115707-1a514809d7d49309
bzrlib/tests/test_merge.py testmerge.py-20050905070950-c1b5aa49ff911024
bzrlib/tests/test_pack_repository.py test_pack_repository-20080801043947-eaw0e6h2gu75kwmy-1
bzrlib/transport/__init__.py transport.py-20050711165921-4978aa7ce1285ad5
bzrlib/transport/http/_pycurl.py pycurlhttp.py-20060110060940-4e2a705911af77a6
bzrlib/workingtree.py workingtree.py-20050511021032-29b6ec0a681e02e3
doc/developers/ppa.txt ppa.txt-20080722055539-606u7t2z32t3ae4w-1
doc/en/mini-tutorial/index.txt index.txt-20070813141352-2u64ooqzo0or4hss-2
doc/es/mini-tutorial/index.txt index.txt-20080504182136-wmoc35u2t6kom8ca-1
------------------------------------------------------------
revno: 3644.2.9
revision-id: john at arbash-meinel.com-20080828200552-sw5lzds9mmi3qnnb
parent: john at arbash-meinel.com-20080825190216-vdkyinz5p5e5s8kq
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Thu 2008-08-28 15:05:52 -0500
message:
Refactor some code.
Move the key, reference, value checking into a helper function.
This func also finds absent references, but that should have
minimal overhead either way.
Also use the _update_nodes_by_key functionality for both
indexes, as _nodes_by_key has the same signature.
Move _spill_mem_keys_to_disk into a separate function
not necessary, but it makes add_node() easier to understand.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
------------------------------------------------------------
revno: 3644.2.8
revision-id: john at arbash-meinel.com-20080825190216-vdkyinz5p5e5s8kq
parent: john at arbash-meinel.com-20080825183623-n5h3h1d5ky8yr7d6
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Mon 2008-08-25 14:02:16 -0500
message:
Two quick tweaks.
Change _iter_mem_nodes to use sorted order.
That way we can sort purely on the 'key' which
we know is the sort key anyway. This shaves off
a *lot* of time spent in 'sorted()'.
Also, if 'references' is in our output nodes,
we know we've already checked that it is a valid
key, so we don't need to check it again.
This shaves another 600ms or so for a bzr.dev tree.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 3644.2.7
revision-id: john at arbash-meinel.com-20080825183623-n5h3h1d5ky8yr7d6
parent: john at arbash-meinel.com-20080825164422-8opo9lb960uev2ce
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Mon 2008-08-25 13:36:23 -0500
message:
Remove accidental merge
modified:
bzrlib/chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
------------------------------------------------------------
revno: 3644.2.6
revision-id: john at arbash-meinel.com-20080825164422-8opo9lb960uev2ce
parent: john at arbash-meinel.com-20080825164250-glc5z58nhwzpgdo2
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Mon 2008-08-25 11:44:22 -0500
message:
Restore the exact old tests, only assert that
_nodes_by_key is None, rather than an empty dict.
modified:
bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 3644.2.5
revision-id: john at arbash-meinel.com-20080825164250-glc5z58nhwzpgdo2
parent: john at arbash-meinel.com-20080825162901-1qweamxqdvv59ie9
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Mon 2008-08-25 11:42:50 -0500
message:
Restore the exact old tests, only assert that
_nodes_by_key is None, rather than an empty dict.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 3644.2.4
revision-id: john at arbash-meinel.com-20080825162901-1qweamxqdvv59ie9
parent: john at arbash-meinel.com-20080825162409-0766y19zjs45m87i
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Mon 2008-08-25 11:29:01 -0500
message:
Change GraphIndex to also have a _get_nodes_by_key
modified:
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
------------------------------------------------------------
revno: 3644.2.3
revision-id: john at arbash-meinel.com-20080825162409-0766y19zjs45m87i
parent: john at arbash-meinel.com-20080825034342-owq0858uk1wp2q0l
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Mon 2008-08-25 11:24:09 -0500
message:
Do a bit more work to get all the tests to pass.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
bzrlib/chunk_writer.py chunk_writer.py-20080630234519-6ggn4id17nipovny-1
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
------------------------------------------------------------
revno: 3644.2.2
revision-id: john at arbash-meinel.com-20080825034342-owq0858uk1wp2q0l
parent: john at arbash-meinel.com-20080825034139-68nxqiqrmqi1l5f0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Sun 2008-08-24 22:43:42 -0500
message:
the new btree index doesn't have 'absent' keys in its _nodes
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 3644.2.1
revision-id: john at arbash-meinel.com-20080825034139-68nxqiqrmqi1l5f0
parent: pqm at pqm.ubuntu.com-20080822042630-on3dxyek4ezk0miu
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: index_builder_cleanup
timestamp: Sun 2008-08-24 22:41:39 -0500
message:
Change the IndexBuilders to not generate the nodes_by_key unless needed.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
bzrlib/tests/test_btree_index.py test_index.py-20080624222253-p0x5f92uyh5hw734-13
=== modified file 'NEWS'
--- a/NEWS 2008-09-03 20:58:40 +0000
+++ b/NEWS 2008-09-03 22:30:56 +0000
@@ -145,6 +145,11 @@
is unknown in both source and target.
(Robert Collins, Aaron Bentley)
+ * ``GraphIndexBuilder.add_node`` and ``BTreeBuilder`` have been
+ streamlined a bit. This should make creating large indexes faster.
+ (In benchmarking, it now takes less time to create a BTree index than
+ it takes to read the GraphIndex one.) (John Arbash Meinel)
+
* Mail clients for `bzr send` are now listed in a registry. This
allows plugins to add new clients by registering them with
``bzrlib.mail_client.mail_client_registry``. All of the built-in
=== modified file 'bzrlib/btree_index.py'
--- a/bzrlib/btree_index.py 2008-08-28 01:59:58 +0000
+++ b/bzrlib/btree_index.py 2008-08-28 20:19:20 +0000
@@ -37,7 +37,6 @@
trace,
)
from bzrlib.index import _OPTION_NODE_REFS, _OPTION_KEY_ELEMENTS, _OPTION_LEN
-from bzrlib.osutils import basename, dirname
from bzrlib.transport import get_transport
@@ -78,8 +77,10 @@
del byte_lines[-1]
skipped_bytes = padding
self.spool.writelines(byte_lines)
- if (self.spool.tell() + skipped_bytes) % _PAGE_SIZE != 0:
- raise AssertionError("incorrect node length")
+ remainder = (self.spool.tell() + skipped_bytes) % _PAGE_SIZE
+ if remainder != 0:
+ raise AssertionError("incorrect node length: %d, %d"
+ % (self.spool.tell(), remainder))
self.nodes += 1
self.writer = None
@@ -135,6 +136,10 @@
key_elements=key_elements)
self._spill_at = spill_at
self._backing_indices = []
+ # A map of {key: (node_refs, value)}
+ self._nodes = {}
+ # Indicate it hasn't been built yet
+ self._nodes_by_key = None
def add_node(self, key, value, references=()):
"""Add a node to the index.
@@ -150,10 +155,32 @@
:param value: The value to associate with the key. It may be any
bytes as long as it does not contain \0 or \n.
"""
- index.GraphIndexBuilder.add_node(self, key, value, references=references)
+ # we don't care about absent_references
+ node_refs, _ = self._check_key_ref_value(key, references, value)
+ if key in self._nodes:
+ raise errors.BadIndexDuplicateKey(key, self)
+ self._nodes[key] = (node_refs, value)
+ self._keys.add(key)
+ if self._nodes_by_key is not None and self._key_length > 1:
+ self._update_nodes_by_key(key, value, node_refs)
if len(self._keys) < self._spill_at:
return
- iterators_to_combine = [iter(sorted(self._iter_mem_nodes()))]
+ self._spill_mem_keys_to_disk()
+
+ def _spill_mem_keys_to_disk(self):
+ """Write the in memory keys down to disk to cap memory consumption.
+
+ If we already have some keys written to disk, we will combine them so
+ as to preserve the sorted order. The algorithm for combining uses
+ powers of two. So on the first spill, write all mem nodes into a
+ single index. On the second spill, combine the mem nodes with the nodes
+ on disk to create a 2x sized disk index and get rid of the first index.
+ On the third spill, create a single new disk index, which will contain
+ the mem nodes, and preserve the existing 2x sized index. On the fourth,
+ combine mem with the first and second indexes, creating a new one of
+ size 4x. On the fifth create a single new one, etc.
+ """
+ iterators_to_combine = [self._iter_mem_nodes()]
pos = -1
for pos, backing in enumerate(self._backing_indices):
if backing is None:
@@ -163,9 +190,11 @@
backing_pos = pos + 1
new_backing_file, size = \
self._write_nodes(self._iter_smallest(iterators_to_combine))
- new_backing = BTreeGraphIndex(
- get_transport(dirname(new_backing_file.name)),
- basename(new_backing_file.name), size)
+ dir_path, base_name = osutils.split(new_backing_file.name)
+ # Note: The transport here isn't strictly needed, because we will use
+ # direct access to the new_backing._file object
+ new_backing = BTreeGraphIndex(get_transport(dir_path),
+ base_name, size)
# GC will clean up the file
new_backing._file = new_backing_file
if len(self._backing_indices) == backing_pos:
@@ -175,7 +204,7 @@
self._backing_indices[pos] = None
self._keys = set()
self._nodes = {}
- self._nodes_by_key = {}
+ self._nodes_by_key = None
def add_nodes(self, nodes):
"""Add nodes to the index.
@@ -191,14 +220,15 @@
def _iter_mem_nodes(self):
"""Iterate over the nodes held in memory."""
+ nodes = self._nodes
if self.reference_lists:
- for key, (absent, references, value) in self._nodes.iteritems():
- if not absent:
- yield self, key, value, references
+ for key in sorted(nodes):
+ references, value = nodes[key]
+ yield self, key, value, references
else:
- for key, (absent, references, value) in self._nodes.iteritems():
- if not absent:
- yield self, key, value
+ for key in sorted(nodes):
+ references, value = nodes[key]
+ yield self, key, value
def _iter_smallest(self, iterators_to_combine):
if len(iterators_to_combine) == 1:
@@ -358,7 +388,10 @@
copied_len = osutils.pumpfile(row.spool, result)
if copied_len != (row.nodes - 1) * _PAGE_SIZE:
if type(row) != _LeafBuilderRow:
- raise AssertionError("Not enough data copied")
+ raise AssertionError("Incorrect amount of data copied"
+ " expected: %d, got: %d"
+ % ((row.nodes - 1) * _PAGE_SIZE,
+ copied_len))
result.flush()
size = result.tell()
result.seek(0)
@@ -384,7 +417,7 @@
"iter_all_entries scales with size of history.")
# Doing serial rather than ordered would be faster; but this shouldn't
# be getting called routinely anyway.
- iterators = [iter(sorted(self._iter_mem_nodes()))]
+ iterators = [self._iter_mem_nodes()]
for backing in self._backing_indices:
if backing is not None:
iterators.append(backing.iter_all_entries())
@@ -404,13 +437,11 @@
if self.reference_lists:
for key in keys.intersection(self._keys):
node = self._nodes[key]
- if not node[0]:
- yield self, key, node[2], node[1]
+ yield self, key, node[1], node[0]
else:
for key in keys.intersection(self._keys):
node = self._nodes[key]
- if not node[0]:
- yield self, key, node[2]
+ yield self, key, node[1]
keys.difference_update(self._keys)
for backing in self._backing_indices:
if backing is None:
@@ -459,12 +490,10 @@
node = self._nodes[key]
except KeyError:
continue
- if node[0]:
- continue
if self.reference_lists:
- yield self, key, node[2], node[1]
+ yield self, key, node[1], node[0]
else:
- yield self, key, node[2]
+ yield self, key, node[1]
return
for key in keys:
# sanity check
@@ -473,7 +502,7 @@
if len(key) != self._key_length:
raise errors.BadIndexKey(key)
# find what it refers to:
- key_dict = self._nodes_by_key
+ key_dict = self._get_nodes_by_key()
elements = list(key)
# find the subdict to return
try:
@@ -499,6 +528,24 @@
else:
yield (self, ) + key_dict
+ def _get_nodes_by_key(self):
+ if self._nodes_by_key is None:
+ nodes_by_key = {}
+ if self.reference_lists:
+ for key, (references, value) in self._nodes.iteritems():
+ key_dict = nodes_by_key
+ for subkey in key[:-1]:
+ key_dict = key_dict.setdefault(subkey, {})
+ key_dict[key[-1]] = key, value, references
+ else:
+ for key, (references, value) in self._nodes.iteritems():
+ key_dict = nodes_by_key
+ for subkey in key[:-1]:
+ key_dict = key_dict.setdefault(subkey, {})
+ key_dict[key[-1]] = key, value
+ self._nodes_by_key = nodes_by_key
+ return self._nodes_by_key
+
def key_count(self):
"""Return an estimate of the number of keys in this index.
=== modified file 'bzrlib/index.py'
--- a/bzrlib/index.py 2008-09-02 17:52:00 +0000
+++ b/bzrlib/index.py 2008-09-03 22:30:56 +0000
@@ -80,8 +80,9 @@
"""
self.reference_lists = reference_lists
self._keys = set()
+ # A dict of {key: (absent, ref_lists, value)}
self._nodes = {}
- self._nodes_by_key = {}
+ self._nodes_by_key = None
self._key_length = key_elements
def _check_key(self, key):
@@ -94,16 +95,61 @@
if not element or _whitespace_re.search(element) is not None:
raise errors.BadIndexKey(element)
- def add_node(self, key, value, references=()):
- """Add a node to the index.
-
- :param key: The key. keys are non-empty tuples containing
- as many whitespace-free utf8 bytestrings as the key length
- defined for this index.
- :param references: An iterable of iterables of keys. Each is a
- reference to another key.
- :param value: The value to associate with the key. It may be any
- bytes as long as it does not contain \0 or \n.
+ def _get_nodes_by_key(self):
+ if self._nodes_by_key is None:
+ nodes_by_key = {}
+ if self.reference_lists:
+ for key, (absent, references, value) in self._nodes.iteritems():
+ if absent:
+ continue
+ key_dict = nodes_by_key
+ for subkey in key[:-1]:
+ key_dict = key_dict.setdefault(subkey, {})
+ key_dict[key[-1]] = key, value, references
+ else:
+ for key, (absent, references, value) in self._nodes.iteritems():
+ if absent:
+ continue
+ key_dict = nodes_by_key
+ for subkey in key[:-1]:
+ key_dict = key_dict.setdefault(subkey, {})
+ key_dict[key[-1]] = key, value
+ self._nodes_by_key = nodes_by_key
+ return self._nodes_by_key
+
+ def _update_nodes_by_key(self, key, value, node_refs):
+ """Update the _nodes_by_key dict with a new key.
+
+ For a key of (foo, bar, baz) create
+ _nodes_by_key[foo][bar][baz] = key_value
+ """
+ if self._nodes_by_key is None:
+ return
+ key_dict = self._nodes_by_key
+ if self.reference_lists:
+ key_value = key, value, node_refs
+ else:
+ key_value = key, value
+ for subkey in key[:-1]:
+ key_dict = key_dict.setdefault(subkey, {})
+ key_dict[key[-1]] = key_value
+
+ def _check_key_ref_value(self, key, references, value):
+ """Check that 'key' and 'references' are all valid.
+
+ :param key: A key tuple. Must conform to the key interface (be a tuple,
+ be of the right length, not have any whitespace or nulls in any key
+ element.)
+ :param references: An iterable of reference lists. Something like
+ [[(ref, key)], [(ref, key), (other, key)]]
+ :param value: The value associate with this key. Must not contain
+ newlines or null characters.
+ :return: (node_refs, absent_references)
+ node_refs basically a packed form of 'references' where all
+ iterables are tuples
+ absent_references reference keys that are not in self._nodes.
+ This may contain duplicates if the same key is
+ referenced in multiple lists.
"""
self._check_key(key)
if _newline_null_re.search(value) is not None:
@@ -111,29 +157,40 @@
if len(references) != self.reference_lists:
raise errors.BadIndexValue(references)
node_refs = []
+ absent_references = []
for reference_list in references:
for reference in reference_list:
- self._check_key(reference)
+ # If reference *is* in self._nodes, then we know it has already
+ # been checked.
if reference not in self._nodes:
- self._nodes[reference] = ('a', (), '')
+ self._check_key(reference)
+ absent_references.append(reference)
node_refs.append(tuple(reference_list))
- if key in self._nodes and self._nodes[key][0] == '':
+ return tuple(node_refs), absent_references
+
+ def add_node(self, key, value, references=()):
+ """Add a node to the index.
+
+ :param key: The key. keys are non-empty tuples containing
+ as many whitespace-free utf8 bytestrings as the key length
+ defined for this index.
+ :param references: An iterable of iterables of keys. Each is a
+ reference to another key.
+ :param value: The value to associate with the key. It may be any
+ bytes as long as it does not contain \0 or \n.
+ """
+ (node_refs,
+ absent_references) = self._check_key_ref_value(key, references, value)
+ if key in self._nodes and self._nodes[key][0] != 'a':
raise errors.BadIndexDuplicateKey(key, self)
- self._nodes[key] = ('', tuple(node_refs), value)
+ for reference in absent_references:
+ # There may be duplicates, but I don't think it is worth worrying
+ # about
+ self._nodes[reference] = ('a', (), '')
+ self._nodes[key] = ('', node_refs, value)
self._keys.add(key)
- if self._key_length > 1:
- key_dict = self._nodes_by_key
- if self.reference_lists:
- key_value = key, value, tuple(node_refs)
- else:
- key_value = key, value
- # possibly should do this on-demand, but it seems likely it is
- # always wanted
- # For a key of (foo, bar, baz) create
- # _nodes_by_key[foo][bar][baz] = key_value
- for subkey in key[:-1]:
- key_dict = key_dict.setdefault(subkey, {})
- key_dict[key[-1]] = key_value
+ if self._nodes_by_key is not None and self._key_length > 1:
+ self._update_nodes_by_key(key, value, node_refs)
def finish(self):
lines = [_SIGNATURE]
@@ -142,7 +199,7 @@
lines.append(_OPTION_LEN + str(len(self._keys)) + '\n')
prefix_length = sum(len(x) for x in lines)
# references are byte offsets. To avoid having to do nasty
- # polynomial work to resolve offsets (references to later in the
+ # polynomial work to resolve offsets (references to later in the
# file cannot be determined until all the inbetween references have
# been calculated too) we pad the offsets with 0's to make them be
# of consistent length. Using binary offsets would break the trivial
@@ -325,14 +382,14 @@
node_value = value
self._nodes[key] = node_value
if self._key_length > 1:
- subkey = list(reversed(key[:-1]))
+ # TODO: We may want to do this lazily, but if we are calling
+ # _buffer_all, we are likely to be doing
+ # iter_entries_prefix
key_dict = self._nodes_by_key
if self.node_ref_lists:
key_value = key, node_value[0], node_value[1]
else:
key_value = key, node_value
- # possibly should do this on-demand, but it seems likely it is
- # always wanted
# For a key of (foo, bar, baz) create
# _nodes_by_key[foo][bar][baz] = key_value
for subkey in key[:-1]:
@@ -1275,6 +1332,7 @@
else:
yield self, key, node[2]
return
+ nodes_by_key = self._get_nodes_by_key()
for key in keys:
# sanity check
if key[0] is None:
@@ -1282,7 +1340,7 @@
if len(key) != self._key_length:
raise errors.BadIndexKey(key)
# find what it refers to:
- key_dict = self._nodes_by_key
+ key_dict = nodes_by_key
elements = list(key)
# find the subdict to return
try:
=== modified file 'bzrlib/tests/test_btree_index.py'
--- a/bzrlib/tests/test_btree_index.py 2008-08-26 00:56:10 +0000
+++ b/bzrlib/tests/test_btree_index.py 2008-08-28 20:13:31 +0000
@@ -354,23 +354,23 @@
# predictably.
self.assertEqual(1, len(builder._nodes))
self.assertEqual(1, len(builder._keys))
- self.assertEqual({}, builder._nodes_by_key)
+ self.assertIs(None, builder._nodes_by_key)
builder.add_node(*nodes[1])
self.assertEqual(0, len(builder._nodes))
self.assertEqual(0, len(builder._keys))
- self.assertEqual({}, builder._nodes_by_key)
+ self.assertIs(None, builder._nodes_by_key)
self.assertEqual(1, len(builder._backing_indices))
self.assertEqual(2, builder._backing_indices[0].key_count())
# now back to memory
builder.add_node(*nodes[2])
self.assertEqual(1, len(builder._nodes))
self.assertEqual(1, len(builder._keys))
- self.assertEqual({}, builder._nodes_by_key)
+ self.assertIs(None, builder._nodes_by_key)
# And spills to a second backing index combing all
builder.add_node(*nodes[3])
self.assertEqual(0, len(builder._nodes))
self.assertEqual(0, len(builder._keys))
- self.assertEqual({}, builder._nodes_by_key)
+ self.assertIs(None, builder._nodes_by_key)
self.assertEqual(2, len(builder._backing_indices))
self.assertEqual(None, builder._backing_indices[0])
self.assertEqual(4, builder._backing_indices[1].key_count())
@@ -379,7 +379,7 @@
builder.add_node(*nodes[5])
self.assertEqual(0, len(builder._nodes))
self.assertEqual(0, len(builder._keys))
- self.assertEqual({}, builder._nodes_by_key)
+ self.assertIs(None, builder._nodes_by_key)
self.assertEqual(2, len(builder._backing_indices))
self.assertEqual(2, builder._backing_indices[0].key_count())
self.assertEqual(4, builder._backing_indices[1].key_count())
@@ -443,24 +443,28 @@
# Test the parts of the index that take up memory are doing so
# predictably.
self.assertEqual(1, len(builder._keys))
- self.assertEqual(2, len(builder._nodes))
- self.assertNotEqual({}, builder._nodes_by_key)
+ self.assertEqual(1, len(builder._nodes))
+ self.assertIs(None, builder._nodes_by_key)
builder.add_node(*nodes[1])
self.assertEqual(0, len(builder._keys))
self.assertEqual(0, len(builder._nodes))
- self.assertEqual({}, builder._nodes_by_key)
+ self.assertIs(None, builder._nodes_by_key)
self.assertEqual(1, len(builder._backing_indices))
self.assertEqual(2, builder._backing_indices[0].key_count())
# now back to memory
+ old = dict(builder._get_nodes_by_key()) #Build up the nodes by key dict
builder.add_node(*nodes[2])
- self.assertEqual(2, len(builder._nodes))
+ self.assertEqual(1, len(builder._nodes))
self.assertEqual(1, len(builder._keys))
+ self.assertIsNot(None, builder._nodes_by_key)
self.assertNotEqual({}, builder._nodes_by_key)
+ # We should have a new entry
+ self.assertNotEqual(old, builder._nodes_by_key)
# And spills to a second backing index combing all
builder.add_node(*nodes[3])
self.assertEqual(0, len(builder._nodes))
self.assertEqual(0, len(builder._keys))
- self.assertEqual({}, builder._nodes_by_key)
+ self.assertIs(None, builder._nodes_by_key)
self.assertEqual(2, len(builder._backing_indices))
self.assertEqual(None, builder._backing_indices[0])
self.assertEqual(4, builder._backing_indices[1].key_count())
@@ -469,7 +473,7 @@
builder.add_node(*nodes[5])
self.assertEqual(0, len(builder._nodes))
self.assertEqual(0, len(builder._keys))
- self.assertEqual({}, builder._nodes_by_key)
+ self.assertIs(None, builder._nodes_by_key)
self.assertEqual(2, len(builder._backing_indices))
self.assertEqual(2, builder._backing_indices[0].key_count())
self.assertEqual(4, builder._backing_indices[1].key_count())
More information about the bazaar-commits
mailing list