Rev 4178: (jam) When spilling btree indexes to disk for memory pressure, in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Fri Mar 20 19:20:41 GMT 2009


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 4178
revision-id: pqm at pqm.ubuntu.com-20090320192036-455rjm03qqnr818d
parent: pqm at pqm.ubuntu.com-20090320183315-5575l3rnaqr1637y
parent: john at arbash-meinel.com-20090320165133-glkmnloupanz532h
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2009-03-20 19:20:36 +0000
message:
  (jam) When spilling btree indexes to disk for memory pressure,
  	don't optimally compress them.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 4168.2.3
    revision-id: john at arbash-meinel.com-20090320165133-glkmnloupanz532h
    parent: john at arbash-meinel.com-20090320160017-z5j80tjyma375n1k
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.14-btree_spill_minimal
    timestamp: Fri 2009-03-20 11:51:33 -0500
    message:
      Add NEWS entry
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
    ------------------------------------------------------------
    revno: 4168.2.2
    revision-id: john at arbash-meinel.com-20090320160017-z5j80tjyma375n1k
    parent: john at arbash-meinel.com-20090319183129-fnm26attyu1yw2s0
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.14-btree_spill_minimal
    timestamp: Fri 2009-03-20 11:00:17 -0500
    message:
      Name the temporary index as it is being generated.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 4168.2.1
    revision-id: john at arbash-meinel.com-20090319183129-fnm26attyu1yw2s0
    parent: pqm at pqm.ubuntu.com-20090319154145-159h7mmiivu3df6v
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: 1.14-btree_spill
    timestamp: Thu 2009-03-19 13:31:29 -0500
    message:
      Disable optimizations when spilling content to disk.
      
      This prevents us from trying *extra* hard to make small indices when
      we are overflowing. We will still generate the final index in minimal
      form, we just won't shrink the intermediate steps.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
=== modified file 'NEWS'
--- a/NEWS	2009-03-20 05:55:49 +0000
+++ b/NEWS	2009-03-20 19:20:36 +0000
@@ -146,6 +146,10 @@
 Internals
 *********
 
+* ``BtreeIndex._spill_mem_keys_to_disk()`` now generates disk index with
+  optmizations turned off. This only has effect when processing > 100,000
+  keys during something like ``bzr pack``. (John Arbash Meinel)
+
 * ``DirState`` can now be passed a custom ``SHA1Provider`` object
   enabling it to store the SHA1 and stat of the canonical (post
   content filtered) form. (Ian Clatworthy)

=== modified file 'bzrlib/btree_index.py'
--- a/bzrlib/btree_index.py	2009-02-18 05:40:39 +0000
+++ b/bzrlib/btree_index.py	2009-03-20 16:00:17 +0000
@@ -189,7 +189,8 @@
             iterators_to_combine.append(backing.iter_all_entries())
         backing_pos = pos + 1
         new_backing_file, size = \
-            self._write_nodes(self._iter_smallest(iterators_to_combine))
+            self._write_nodes(self._iter_smallest(iterators_to_combine),
+                              allow_optimize=False)
         dir_path, base_name = osutils.split(new_backing_file.name)
         # Note: The transport here isn't strictly needed, because we will use
         #       direct access to the new_backing._file object
@@ -262,11 +263,14 @@
             except StopIteration:
                 current_values[pos] = None
 
-    def _add_key(self, string_key, line, rows):
+    def _add_key(self, string_key, line, rows, allow_optimize=True):
         """Add a key to the current chunk.
 
         :param string_key: The key to add.
         :param line: The fully serialised key and value.
+        :param allow_optimize: If set to False, prevent setting the optimize
+            flag when writing out. This is used by the _spill_mem_keys_to_disk
+            functionality.
         """
         if rows[-1].writer is None:
             # opening a new leaf chunk;
@@ -277,8 +281,12 @@
                     length = _PAGE_SIZE
                     if internal_row.nodes == 0:
                         length -= _RESERVED_HEADER_BYTES # padded
+                    if allow_optimize:
+                        optimize_for_size = self._optimize_for_size
+                    else:
+                        optimize_for_size = False
                     internal_row.writer = chunk_writer.ChunkWriter(length, 0,
-                        optimize_for_size=self._optimize_for_size)
+                        optimize_for_size=optimize_for_size)
                     internal_row.writer.write(_INTERNAL_FLAG)
                     internal_row.writer.write(_INTERNAL_OFFSET +
                         str(rows[pos + 1].nodes) + "\n")
@@ -322,13 +330,16 @@
                 new_row.writer.write(_INTERNAL_OFFSET +
                     str(rows[1].nodes - 1) + "\n")
                 new_row.writer.write(key_line)
-            self._add_key(string_key, line, rows)
+            self._add_key(string_key, line, rows, allow_optimize=allow_optimize)
 
-    def _write_nodes(self, node_iterator):
+    def _write_nodes(self, node_iterator, allow_optimize=True):
         """Write node_iterator out as a B+Tree.
 
         :param node_iterator: An iterator of sorted nodes. Each node should
             match the output given by iter_all_entries.
+        :param allow_optimize: If set to False, prevent setting the optimize
+            flag when writing out. This is used by the _spill_mem_keys_to_disk
+            functionality.
         :return: A file handle for a temporary file containing a B+Tree for
             the nodes.
         """
@@ -353,11 +364,11 @@
             key_count += 1
             string_key, line = _btree_serializer._flatten_node(node,
                                     self.reference_lists)
-            self._add_key(string_key, line, rows)
+            self._add_key(string_key, line, rows, allow_optimize=allow_optimize)
         for row in reversed(rows):
             pad = (type(row) != _LeafBuilderRow)
             row.finish_node(pad=pad)
-        result = tempfile.NamedTemporaryFile()
+        result = tempfile.NamedTemporaryFile(prefix='bzr-index-')
         lines = [_BTSIGNATURE]
         lines.append(_OPTION_NODE_REFS + str(self.reference_lists) + '\n')
         lines.append(_OPTION_KEY_ELEMENTS + str(self._key_length) + '\n')




More information about the bazaar-commits mailing list