Rev 3216: (robertc) Tune RemoteRepository.get_parent_map to not send duplicate in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Wed Feb 6 01:45:39 GMT 2008
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 3216
revision-id:pqm at pqm.ubuntu.com-20080206014528-rb8v4hl04mgxulb7
parent: pqm at pqm.ubuntu.com-20080205231318-dcc9hdxcxjd7hx79
parent: robertc at robertcollins.net-20080206000653-33ywsskbk3rh1n0l
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2008-02-06 01:45:28 +0000
message:
(robertc) Tune RemoteRepository.get_parent_map to not send duplicate
results, and to compress the data,
reducing round trips further. (Robert Collins)
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/graph.py graph_walker.py-20070525030359-y852guab65d4wtn0-1
bzrlib/remote.py remote.py-20060720103555-yeeg2x51vn0rbtdp-1
bzrlib/smart/protocol.py protocol.py-20061108035435-ot0lstk2590yqhzr-1
bzrlib/smart/repository.py repository.py-20061128022038-vr5wy5bubyb8xttk-1
bzrlib/tests/test_remote.py test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
bzrlib/tests/test_smart.py test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
------------------------------------------------------------
revno: 3211.5.4
revision-id:robertc at robertcollins.net-20080206000653-33ywsskbk3rh1n0l
parent: robertc at robertcollins.net-20080205224247-10d7v6tcsv3nrj2i
committer: Robert Collins <robertc at robertcollins.net>
branch nick: integration
timestamp: Wed 2008-02-06 11:06:53 +1100
message:
Remove sample code that snuck in.
modified:
bzrlib/tests/repository_implementations/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
------------------------------------------------------------
revno: 3211.5.3
revision-id:robertc at robertcollins.net-20080205224247-10d7v6tcsv3nrj2i
parent: robertc at robertcollins.net-20080205220757-p2j8klf0hwhzaim4
committer: Robert Collins <robertc at robertcollins.net>
branch nick: get_parents.duplicates
timestamp: Wed 2008-02-06 09:42:47 +1100
message:
Adjust size of batch and change gzip comments to bzip2.
modified:
bzrlib/smart/repository.py repository.py-20061128022038-vr5wy5bubyb8xttk-1
bzrlib/tests/test_smart.py test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
------------------------------------------------------------
revno: 3211.5.2
revision-id:robertc at robertcollins.net-20080205220757-p2j8klf0hwhzaim4
parent: robertc at robertcollins.net-20080203225508-0rogbg0ggonuqfhp
committer: Robert Collins <robertc at robertcollins.net>
branch nick: get_parents.duplicates
timestamp: Wed 2008-02-06 09:07:57 +1100
message:
Change RemoteRepository.get_parent_map to use bz2 not gzip for compression.
modified:
bzrlib/remote.py remote.py-20060720103555-yeeg2x51vn0rbtdp-1
bzrlib/smart/repository.py repository.py-20061128022038-vr5wy5bubyb8xttk-1
bzrlib/tests/test_remote.py test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
bzrlib/tests/test_smart.py test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
------------------------------------------------------------
revno: 3211.5.1
revision-id:robertc at robertcollins.net-20080203225508-0rogbg0ggonuqfhp
parent: pqm at pqm.ubuntu.com-20080201053934-q32y2nk5vvo13c6v
committer: Robert Collins <robertc at robertcollins.net>
branch nick: get_parents.duplicates
timestamp: Mon 2008-02-04 09:55:08 +1100
message:
Change the smart server get_parents method to take a graph search to exclude already recieved parents from. This prevents history shortcuts causing huge numbers of duplicates.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/graph.py graph_walker.py-20070525030359-y852guab65d4wtn0-1
bzrlib/remote.py remote.py-20060720103555-yeeg2x51vn0rbtdp-1
bzrlib/smart/protocol.py protocol.py-20061108035435-ot0lstk2590yqhzr-1
bzrlib/smart/repository.py repository.py-20061128022038-vr5wy5bubyb8xttk-1
bzrlib/tests/repository_implementations/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
bzrlib/tests/test_remote.py test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
bzrlib/tests/test_smart.py test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
=== modified file 'NEWS'
--- a/NEWS 2008-02-05 23:13:18 +0000
+++ b/NEWS 2008-02-06 01:45:28 +0000
@@ -29,6 +29,11 @@
* ``merge --preview`` produces a diff of the changes merge would make,
but does not actually perform the merge. (Aaron Bentley)
+ * New smart method ``Repository.get_parent_map`` for getting revision
+ parent data. This returns additional parent information topologically
+ adjacent to the requested data to reduce round trip latency impacts.
+ (Robert Collins)
+
* New smart method, ``Repository.stream_revisions_chunked``, for fetching
revision data that streams revision data via a chunked encoding. This
avoids buffering large amounts of revision data on the server and on the
=== modified file 'bzrlib/graph.py'
--- a/bzrlib/graph.py 2008-01-29 06:56:51 +0000
+++ b/bzrlib/graph.py 2008-02-03 22:55:08 +0000
@@ -514,7 +514,7 @@
# exclude keys for them. However, while we could have a second
# look-ahead result buffer and shuffle things around, this method
# is typically only called once per search - when memoising the
- # results of the search.
+ # results of the search.
found, ghosts, next, parents = self._do_query(self._next_query)
# pretend we didn't query: perhaps we should tweak _do_query to be
# entirely stateless?
=== modified file 'bzrlib/remote.py'
--- a/bzrlib/remote.py 2008-02-04 03:50:39 +0000
+++ b/bzrlib/remote.py 2008-02-06 01:45:28 +0000
@@ -17,6 +17,7 @@
# TODO: At some point, handle upgrades by just passing the whole request
# across to run on the server.
+import bz2
from cStringIO import StringIO
from bzrlib import (
@@ -272,6 +273,8 @@
self._leave_lock = False
# A cache of looked up revision parent data; reset at unlock time.
self._parents_map = None
+ if 'hpss' in debug.debug_flags:
+ self._requested_parents = None
# For tests:
# These depend on the actual remote format, so force them off for
# maximum compatibility. XXX: In future these should depend on the
@@ -476,6 +479,8 @@
self._lock_mode = 'r'
self._lock_count = 1
self._parents_map = {}
+ if 'hpss' in debug.debug_flags:
+ self._requested_parents = set()
if self._real_repository is not None:
self._real_repository.lock_read()
else:
@@ -514,6 +519,8 @@
self._lock_mode = 'w'
self._lock_count = 1
self._parents_map = {}
+ if 'hpss' in debug.debug_flags:
+ self._requested_parents = set()
elif self._lock_mode == 'r':
raise errors.ReadOnlyError(self)
else:
@@ -574,6 +581,8 @@
if self._lock_count > 0:
return
self._parents_map = None
+ if 'hpss' in debug.debug_flags:
+ self._requested_parents = None
old_mode = self._lock_mode
self._lock_mode = None
try:
@@ -776,7 +785,12 @@
len(set(self._parents_map).intersection(parent_map)),
len(parent_map))
self._parents_map.update(parent_map)
- return dict((k, ancestry[k]) for k in keys if k in ancestry)
+ present_keys = [k for k in keys if k in ancestry]
+ if 'hpss' in debug.debug_flags:
+ self._requested_parents.update(present_keys)
+ mutter('Current RemoteRepository graph hit rate: %d%%',
+ 100.0 * len(self._requested_parents) / len(self._parents_map))
+ return dict((k, ancestry[k]) for k in present_keys)
def _response_is_unknown_method(self, response, verb):
"""Return True if response is an unknonwn method response to verb.
@@ -806,12 +820,33 @@
return found_parents
else:
found_parents = {}
+ # TODO(Needs analysis): We could assume that the keys being requested
+ # from get_parent_map are in a breadth first search, so typically they
+ # will all be depth N from some common parent, and we don't have to
+ # have the server iterate from the root parent, but rather from the
+ # keys we're searching; and just tell the server the keyspace we
+ # already have; but this may be more traffic again.
+
+ # Transform self._parents_map into a search request recipe.
+ # TODO: Manage this incrementally to avoid covering the same path
+ # repeatedly. (The server will have to on each request, but the less
+ # work done the better).
+ start_set = set(self._parents_map)
+ result_parents = set()
+ for parents in self._parents_map.itervalues():
+ result_parents.update(parents)
+ stop_keys = result_parents.difference(start_set)
+ included_keys = start_set.intersection(result_parents)
+ start_set.difference_update(included_keys)
+ recipe = (start_set, stop_keys, len(self._parents_map))
+ body = self._serialise_search_recipe(recipe)
path = self.bzrdir._path_for_remote_call(self._client)
for key in keys:
assert type(key) is str
verb = 'Repository.get_parent_map'
- response = self._client.call_expecting_body(
- verb, path, *keys)
+ args = (path,) + tuple(keys)
+ response = self._client.call_with_body_bytes_expecting_body(
+ verb, args, self._serialise_search_recipe(recipe))
if self._response_is_unknown_method(response, verb):
# Server that does not support this method, get the whole graph.
response = self._client.call_expecting_body(
@@ -823,7 +858,7 @@
reponse[1].cancel_read_body()
raise errors.UnexpectedSmartServerResponse(response[0])
if response[0][0] == 'ok':
- coded = response[1].read_body_bytes()
+ coded = bz2.decompress(response[1].read_body_bytes())
if coded == '':
# no revisions found
return {}
@@ -976,11 +1011,7 @@
def get_data_stream_for_search(self, search):
REQUEST_NAME = 'Repository.stream_revisions_chunked'
path = self.bzrdir._path_for_remote_call(self._client)
- recipe = search.get_recipe()
- start_keys = ' '.join(recipe[0])
- stop_keys = ' '.join(recipe[1])
- count = str(recipe[2])
- body = '\n'.join((start_keys, stop_keys, count))
+ body = self._serialise_search_recipe(search.get_recipe())
response, protocol = self._client.call_with_body_bytes_expecting_body(
REQUEST_NAME, (path,), body)
@@ -1041,6 +1072,17 @@
def _make_parents_provider(self):
return self
+ def _serialise_search_recipe(self, recipe):
+ """Serialise a graph search recipe.
+
+ :param recipe: A search recipe (start, stop, count).
+ :return: Serialised bytes.
+ """
+ start_keys = ' '.join(recipe[0])
+ stop_keys = ' '.join(recipe[1])
+ count = str(recipe[2])
+ return '\n'.join((start_keys, stop_keys, count))
+
class RemoteBranchLockableFiles(LockableFiles):
"""A 'LockableFiles' implementation that talks to a smart server.
=== modified file 'bzrlib/smart/protocol.py'
--- a/bzrlib/smart/protocol.py 2008-01-21 23:04:54 +0000
+++ b/bzrlib/smart/protocol.py 2008-02-03 22:55:08 +0000
@@ -488,6 +488,8 @@
mutter(' (to %s)', self._request._medium._path)
mutter(' %d bytes', len(body))
self._request_start_time = time.time()
+ if 'hpssdetail' in debug.debug_flags:
+ mutter('hpss body content: %s', body)
self._write_args(args)
bytes = self._encode_bulk_data(body)
self._request.accept_bytes(bytes)
=== modified file 'bzrlib/smart/repository.py'
--- a/bzrlib/smart/repository.py 2008-01-17 07:47:52 +0000
+++ b/bzrlib/smart/repository.py 2008-02-05 22:42:47 +0000
@@ -16,6 +16,7 @@
"""Server-side repository related request implmentations."""
+import bz2
from cStringIO import StringIO
import os
import sys
@@ -58,29 +59,77 @@
# is expected)
return None
+ def recreate_search(self, repository, recipe_bytes):
+ lines = recipe_bytes.split('\n')
+ start_keys = set(lines[0].split(' '))
+ exclude_keys = set(lines[1].split(' '))
+ revision_count = int(lines[2])
+ repository.lock_read()
+ try:
+ search = repository.get_graph()._make_breadth_first_searcher(
+ start_keys)
+ while True:
+ try:
+ next_revs = search.next()
+ except StopIteration:
+ break
+ search.stop_searching_any(exclude_keys.intersection(next_revs))
+ search_result = search.get_result()
+ if search_result.get_recipe()[2] != revision_count:
+ # we got back a different amount of data than expected, this
+ # gets reported as NoSuchRevision, because less revisions
+ # indicates missing revisions, and more should never happen as
+ # the excludes list considers ghosts and ensures that ghost
+ # filling races are not a problem.
+ return (None, FailedSmartServerResponse(('NoSuchRevision',)))
+ return (search, None)
+ finally:
+ repository.unlock()
+
class SmartServerRepositoryGetParentMap(SmartServerRepositoryRequest):
+ """Bzr 1.2+ - get parent data for revisions during a graph search."""
def do_repository_request(self, repository, *revision_ids):
- repository.lock_read()
- try:
- return self._do_repository_request(repository, revision_ids)
- finally:
- repository.unlock()
-
- def _do_repository_request(self, repository, revision_ids):
"""Get parent details for some revisions.
All the parents for revision_ids are returned. Additionally up to 64KB
of additional parent data found by performing a breadth first search
- from revision_ids is returned.
+ from revision_ids is returned. The verb takes a body containing the
+ current search state, see do_body for details.
:param repository: The repository to query in.
:param revision_ids: The utf8 encoded revision_id to answer for.
+ """
+ self._revision_ids = revision_ids
+ return None # Signal that we want a body.
+
+ def do_body(self, body_bytes):
+ """Process the current search state and perform the parent lookup.
+
:return: A smart server response where the body contains an utf8
- encoded flattened list of the parents of the revisions, (the same
- format as Repository.get_revision_graph).
+ encoded flattened list of the parents of the revisions (the same
+ format as Repository.get_revision_graph) which has been bz2
+ compressed.
"""
+ repository = self._repository
+ repository.lock_read()
+ try:
+ return self._do_repository_request(body_bytes)
+ finally:
+ repository.unlock()
+
+ def _do_repository_request(self, body_bytes):
+ repository = self._repository
+ revision_ids = set(self._revision_ids)
+ search, error = self.recreate_search(repository, body_bytes)
+ if error is not None:
+ return error
+ # TODO might be nice to start up the search again; but thats not
+ # written or tested yet.
+ client_seen_revs = set(search.get_result().get_keys())
+ # Always include the requested ids.
+ client_seen_revs.difference_update(revision_ids)
lines = []
repo_graph = repository.get_graph()
result = {}
@@ -96,25 +145,32 @@
# adjust for the wire
if parents == (_mod_revision.NULL_REVISION,):
parents = ()
- # add parents to the result
- result[revision_id] = parents
# prepare the next query
next_revs.update(parents)
- # Approximate the serialized cost of this revision_id.
- size_so_far += 2 + len(revision_id) + sum(map(len, parents))
- # get all the directly asked for parents, and then flesh out to
- # 64K or so.
- if first_loop_done and size_so_far > 65000:
- next_revs = set()
- break
+ if revision_id not in client_seen_revs:
+ # Client does not have this revision, give it to it.
+ # add parents to the result
+ result[revision_id] = parents
+ # Approximate the serialized cost of this revision_id.
+ size_so_far += 2 + len(revision_id) + sum(map(len, parents))
+ # get all the directly asked for parents, and then flesh out to
+ # 64K (compressed) or so. We do one level of depth at a time to
+ # stay in sync with the client. The 250000 magic number is
+ # estimated compression ratio taken from bzr.dev itself.
+ if first_loop_done and size_so_far > 250000:
+ next_revs = set()
+ break
# don't query things we've already queried
next_revs.difference_update(queried_revs)
first_loop_done = True
- for revision, parents in result.items():
+ # sorting trivially puts lexographically similar revision ids together.
+ # Compression FTW.
+ for revision, parents in sorted(result.items()):
lines.append(' '.join((revision, ) + tuple(parents)))
- return SuccessfulSmartServerResponse(('ok', ), '\n'.join(lines))
+ return SuccessfulSmartServerResponse(
+ ('ok', ), bz2.compress('\n'.join(lines)))
class SmartServerRepositoryGetRevisionGraph(SmartServerRepositoryRequest):
@@ -342,31 +398,15 @@
"""Bzr 1.1+ streaming pull."""
def do_body(self, body_bytes):
- lines = body_bytes.split('\n')
- start_keys = set(lines[0].split(' '))
- exclude_keys = set(lines[1].split(' '))
- revision_count = int(lines[2])
repository = self._repository
repository.lock_read()
try:
- search = repository.get_graph()._make_breadth_first_searcher(
- start_keys)
- while True:
- try:
- next_revs = search.next()
- except StopIteration:
- break
- search.stop_searching_any(exclude_keys.intersection(next_revs))
- search_result = search.get_result()
- if search_result.get_recipe()[2] != revision_count:
- # we got back a different amount of data than expected, this
- # gets reported as NoSuchRevision, because less revisions
- # indicates missing revisions, and more should never happen as
- # the excludes list considers ghosts and ensures that ghost
- # filling races are not a problem.
- return FailedSmartServerResponse(('NoSuchRevision',))
- stream = repository.get_data_stream_for_search(search_result)
+ search, error = self.recreate_search(repository, body_bytes)
+ if error is not None:
+ return error
+ stream = repository.get_data_stream_for_search(search.get_result())
except Exception:
+ # On non-error, unlocking is done by the body stream handler.
repository.unlock()
raise
return SuccessfulSmartServerResponse(('ok',),
=== modified file 'bzrlib/tests/test_remote.py'
--- a/bzrlib/tests/test_remote.py 2008-02-04 03:50:39 +0000
+++ b/bzrlib/tests/test_remote.py 2008-02-06 01:45:28 +0000
@@ -23,6 +23,7 @@
These tests correspond to tests.test_smart, which exercises the server side.
"""
+import bz2
from cStringIO import StringIO
from bzrlib import (
@@ -625,7 +626,7 @@
r1 = u'\u0e33'.encode('utf8')
r2 = u'\u0dab'.encode('utf8')
lines = [' '.join([r2, r1]), r1]
- encoded_body = '\n'.join(lines)
+ encoded_body = bz2.compress('\n'.join(lines))
responses = [(('ok', ), encoded_body), (('ok', ), encoded_body)]
transport_path = 'quack'
@@ -641,8 +642,8 @@
parents = graph.get_parent_map([r1])
self.assertEqual({r1: (NULL_REVISION,)}, parents)
self.assertEqual(
- [('call_expecting_body', 'Repository.get_parent_map',
- ('quack/', r2))],
+ [('call_with_body_bytes_expecting_body',
+ 'Repository.get_parent_map', ('quack/', r2), '\n\n0')],
client._calls)
repo.unlock()
# now we call again, and it should use the second response.
@@ -651,10 +652,10 @@
parents = graph.get_parent_map([r1])
self.assertEqual({r1: (NULL_REVISION,)}, parents)
self.assertEqual(
- [('call_expecting_body', 'Repository.get_parent_map',
- ('quack/', r2)),
- ('call_expecting_body', 'Repository.get_parent_map',
- ('quack/', r1))
+ [('call_with_body_bytes_expecting_body',
+ 'Repository.get_parent_map', ('quack/', r2), '\n\n0'),
+ ('call_with_body_bytes_expecting_body',
+ 'Repository.get_parent_map', ('quack/', r1), '\n\n0'),
],
client._calls)
repo.unlock()
=== modified file 'bzrlib/tests/test_smart.py'
--- a/bzrlib/tests/test_smart.py 2008-01-17 07:49:09 +0000
+++ b/bzrlib/tests/test_smart.py 2008-02-05 22:42:47 +0000
@@ -24,6 +24,7 @@
Tests for low-level protocol encoding are found in test_smart_transport.
"""
+import bz2
from StringIO import StringIO
import tempfile
import tarfile
@@ -529,6 +530,22 @@
request.execute, backing.local_abspath('subdir'))
+class TestSmartServerRepositoryGetParentMap(tests.TestCaseWithTransport):
+
+ def test_trivial_bzipped(self):
+ # This tests that the wire encoding is actually bzipped
+ backing = self.get_transport()
+ request = smart.repository.SmartServerRepositoryGetParentMap(backing)
+ tree = self.make_branch_and_memory_tree('.')
+
+ self.assertEqual(None,
+ request.execute(backing.local_abspath(''), 'missing-id'))
+ # Note that it returns a body (of '' bzipped).
+ self.assertEqual(
+ SuccessfulSmartServerResponse(('ok', ), bz2.compress('')),
+ request.do_body('\n\n0\n'))
+
+
class TestSmartServerRepositoryGetRevisionGraph(tests.TestCaseWithTransport):
def test_none_argument(self):
More information about the bazaar-commits
mailing list