[MERGE] btree prefetch
Martin Pool
mbp at canonical.com
Tue Oct 28 06:54:06 GMT 2008
Martin Pool has voted tweak.
Status is now: Conditionally approved
Comment:
+ def _compute_recommended_pages(self):
+ """Given the transport's recommended_page_size, determine num
pages."""
So below you use the term 'num pages' to mean the number of pages in the
index; what this seems to mean is the number of pages to try to read in
each read.
+ recommended_read = self._transport.recommended_page_size()
+ recommended_pages = int(math.ceil(recommended_read /
+ float(_PAGE_SIZE)))
+ return recommended_pages
+
+ def _compute_num_pages(self):
+ """Determine the offset to the last page in this index."""
This docstring seems inconsistent with the method name.
It's actually the number of pages in the index file?
+ if self._size is None:
+ raise AssertionError('_compute_num_pages should not be
called'
+ ' when self._size is None')
+ if self._root_node is not None:
+ # This is the number of pages as defined by the header
+ return self._row_offsets[-1]
+ # This is the number of pages as defined by the size of the
index. They
+ # should be indentical.
+ num_pages = int(math.ceil(self._size / float(_PAGE_SIZE)))
+ return num_pages
+
+ def _expand_nodes(self, node_indexes):
+ """Find extra nodes to download.
+
+ The idea is that we always want to make big-enough requests
(like 64kB
+ for http), so that we don't waste round trips. So given the
entries
+ that we already have cached, and the new nodes being
downloaded, figure
+ out what other pages we might want to read.
+
+ :param node_indexes: The nodes that have been requested to
read.
+ :return: A new list of nodes to download
+ """
A pointer to the doc file would be good here.
+ if 'index' in debug.debug_flags:
+ trace.mutter('expanding: %s\tnodes: %s', self._name,
node_indexes)
+
+ if len(node_indexes) >= self._recommended_pages:
+ # Don't add more, we are already requesting more than
enough
+ if 'index' in debug.debug_flags:
+ trace.mutter(' not expanding large request (%s >=
%s)',
+ len(node_indexes),
self._recommended_pages)
+ return node_indexes
+ if self._size is None:
+ # Don't try anything, because we don't know where the file
ends
+ if 'index' in debug.debug_flags:
+ trace.mutter(' not expanding without knowing index
size')
+ return node_indexes
+ num_pages = self._compute_num_pages()
Maybe change this and the method to 'total_pages'?
+ # The idea here is to get nodes "next to" the ones we are
already
+ # getting.
+ cached_keys = self._get_cached_keys()
+
How does that comment relate to the code that follows?
+ # If reading recommended_pages would read the rest of the
index, just
+ # do so.
+ if num_pages - len(cached_keys) <= self._recommended_pages:
+ # Read whatever is left
+ if cached_keys:
+ expanded = [x for x in xrange(num_pages)
+ if x not in cached_keys]
+ else:
+ expanded = range(num_pages)
It looks like you're searching for integers in a set of key tuples,
which
would presumably never match?
+ if 'index' in debug.debug_flags:
+ trace.mutter(' reading all unread pages: %s',
expanded)
+ return expanded
+
+ if self._root_node is None:
+ # ATM on the first read of the root node of a large index,
we don't
+ # bother pre-reading any other pages. This is because the
+ # likelyhood of actually reading interesting pages is very
low.
+ # See doc/developers/btree_index_prefetch.txt for a
discussion, and
+ # a possible implementation when we are guessing that the
second
+ # layer index is small
+ final_nodes = node_indexes
+ else:
+ tree_depth = len(self._row_lengths)
+ if len(cached_keys) < tree_depth and len(node_indexes) ==
1:
+ # We haven't read enough to justify expansion
+ # If we are only going to read the root node, and 1
leaf node,
+ # then it isn't worth expanding our request. Once we've
read at
+ # least 2 nodes, then we are probably doing a search,
and we
+ # start expanding our requests.
+ if 'index' in debug.debug_flags:
+ trace.mutter(' not expanding on first reads')
+ return node_indexes
+
+ # Expand requests to neighbors until we have at least
+ # recommended_pages to request. We only want to expand
requests
+ # within a given layer. We cheat a little bit and assume
all
+ # requests will be in the same layer. This is true given
the
+ # current design, but if it changes this algorithm may
perform
+ # oddly.
+ final_nodes = set(node_indexes)
+ first = end = None
+ new_tips = set(final_nodes)
+ while len(final_nodes) < self._recommended_pages and
new_tips:
+ next_tips = set()
+ for pos in new_tips:
+ if first is None:
+ first, end =
self._find_layer_first_and_end(pos)
+ previous = pos - 1
+ if (previous > 0
+ and previous not in cached_keys
+ and previous not in final_nodes
+ and previous >= first):
+ next_tips.add(previous)
+ after = pos + 1
+ if (after < num_pages
+ and after not in cached_keys
+ and after not in final_nodes
+ and after < end):
+ next_tips.add(after)
+ # This would keep us from going bigger than
+ # recommended_pages by only expanding the first
nodes.
+ # However, if we are making a 'wide' request, it is
+ # reasonable to expand all points equally.
+ # if len(final_nodes) > recommended_pages:
+ # break
+ final_nodes.update(next_tips)
+ new_tips = next_tips
This section above is the kind of place where it might be good to split
it
into a separate, perhaps pure function, that could be more easily tested
or reasoned-about in isolation. This method is still pretty readable so
it's just something to think about next time.
=== added file 'doc/developers/btree_index_prefetch.txt'
--- doc/developers/btree_index_prefetch.txt 1970-01-01 00:00:00
+0000
+++ doc/developers/btree_index_prefetch.txt 2008-10-14 21:35:27
+0000
@@ -0,0 +1,294 @@
+====================
+BTree Index Prefetch
+====================
+
+This document outlines how we decide to pre-read extra nodes in the
btree
+index.
+
+
You should link this in to doc/developers/index.txt
I think I read this document when you were thinking about implementing
this algorithm?
+Algorithm
+=========
+
+This is the basic outline of the algorithm.
The algorithm seems reasonable and is well explained.
vim: ft=diff
For details, see:
http://bundlebuggy.aaronbentley.com/project/bzr/request/%3C48F51166.8030709%40arbash-meinel.com%3E
Project: Bazaar
More information about the bazaar
mailing list