Rev 4178: (jam) When spilling btree indexes to disk for memory pressure, in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Fri Mar 20 19:20:41 GMT 2009
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 4178
revision-id: pqm at pqm.ubuntu.com-20090320192036-455rjm03qqnr818d
parent: pqm at pqm.ubuntu.com-20090320183315-5575l3rnaqr1637y
parent: john at arbash-meinel.com-20090320165133-glkmnloupanz532h
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2009-03-20 19:20:36 +0000
message:
(jam) When spilling btree indexes to disk for memory pressure,
don't optimally compress them.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 4168.2.3
revision-id: john at arbash-meinel.com-20090320165133-glkmnloupanz532h
parent: john at arbash-meinel.com-20090320160017-z5j80tjyma375n1k
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 1.14-btree_spill_minimal
timestamp: Fri 2009-03-20 11:51:33 -0500
message:
Add NEWS entry
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
------------------------------------------------------------
revno: 4168.2.2
revision-id: john at arbash-meinel.com-20090320160017-z5j80tjyma375n1k
parent: john at arbash-meinel.com-20090319183129-fnm26attyu1yw2s0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 1.14-btree_spill_minimal
timestamp: Fri 2009-03-20 11:00:17 -0500
message:
Name the temporary index as it is being generated.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 4168.2.1
revision-id: john at arbash-meinel.com-20090319183129-fnm26attyu1yw2s0
parent: pqm at pqm.ubuntu.com-20090319154145-159h7mmiivu3df6v
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 1.14-btree_spill
timestamp: Thu 2009-03-19 13:31:29 -0500
message:
Disable optimizations when spilling content to disk.
This prevents us from trying *extra* hard to make small indices when
we are overflowing. We will still generate the final index in minimal
form, we just won't shrink the intermediate steps.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
=== modified file 'NEWS'
--- a/NEWS 2009-03-20 05:55:49 +0000
+++ b/NEWS 2009-03-20 19:20:36 +0000
@@ -146,6 +146,10 @@
Internals
*********
+* ``BtreeIndex._spill_mem_keys_to_disk()`` now generates disk index with
+ optmizations turned off. This only has effect when processing > 100,000
+ keys during something like ``bzr pack``. (John Arbash Meinel)
+
* ``DirState`` can now be passed a custom ``SHA1Provider`` object
enabling it to store the SHA1 and stat of the canonical (post
content filtered) form. (Ian Clatworthy)
=== modified file 'bzrlib/btree_index.py'
--- a/bzrlib/btree_index.py 2009-02-18 05:40:39 +0000
+++ b/bzrlib/btree_index.py 2009-03-20 16:00:17 +0000
@@ -189,7 +189,8 @@
iterators_to_combine.append(backing.iter_all_entries())
backing_pos = pos + 1
new_backing_file, size = \
- self._write_nodes(self._iter_smallest(iterators_to_combine))
+ self._write_nodes(self._iter_smallest(iterators_to_combine),
+ allow_optimize=False)
dir_path, base_name = osutils.split(new_backing_file.name)
# Note: The transport here isn't strictly needed, because we will use
# direct access to the new_backing._file object
@@ -262,11 +263,14 @@
except StopIteration:
current_values[pos] = None
- def _add_key(self, string_key, line, rows):
+ def _add_key(self, string_key, line, rows, allow_optimize=True):
"""Add a key to the current chunk.
:param string_key: The key to add.
:param line: The fully serialised key and value.
+ :param allow_optimize: If set to False, prevent setting the optimize
+ flag when writing out. This is used by the _spill_mem_keys_to_disk
+ functionality.
"""
if rows[-1].writer is None:
# opening a new leaf chunk;
@@ -277,8 +281,12 @@
length = _PAGE_SIZE
if internal_row.nodes == 0:
length -= _RESERVED_HEADER_BYTES # padded
+ if allow_optimize:
+ optimize_for_size = self._optimize_for_size
+ else:
+ optimize_for_size = False
internal_row.writer = chunk_writer.ChunkWriter(length, 0,
- optimize_for_size=self._optimize_for_size)
+ optimize_for_size=optimize_for_size)
internal_row.writer.write(_INTERNAL_FLAG)
internal_row.writer.write(_INTERNAL_OFFSET +
str(rows[pos + 1].nodes) + "\n")
@@ -322,13 +330,16 @@
new_row.writer.write(_INTERNAL_OFFSET +
str(rows[1].nodes - 1) + "\n")
new_row.writer.write(key_line)
- self._add_key(string_key, line, rows)
+ self._add_key(string_key, line, rows, allow_optimize=allow_optimize)
- def _write_nodes(self, node_iterator):
+ def _write_nodes(self, node_iterator, allow_optimize=True):
"""Write node_iterator out as a B+Tree.
:param node_iterator: An iterator of sorted nodes. Each node should
match the output given by iter_all_entries.
+ :param allow_optimize: If set to False, prevent setting the optimize
+ flag when writing out. This is used by the _spill_mem_keys_to_disk
+ functionality.
:return: A file handle for a temporary file containing a B+Tree for
the nodes.
"""
@@ -353,11 +364,11 @@
key_count += 1
string_key, line = _btree_serializer._flatten_node(node,
self.reference_lists)
- self._add_key(string_key, line, rows)
+ self._add_key(string_key, line, rows, allow_optimize=allow_optimize)
for row in reversed(rows):
pad = (type(row) != _LeafBuilderRow)
row.finish_node(pad=pad)
- result = tempfile.NamedTemporaryFile()
+ result = tempfile.NamedTemporaryFile(prefix='bzr-index-')
lines = [_BTSIGNATURE]
lines.append(_OPTION_NODE_REFS + str(self.reference_lists) + '\n')
lines.append(_OPTION_KEY_ELEMENTS + str(self._key_length) + '\n')
More information about the bazaar-commits
mailing list