[MERGE] Repository.get_parent_map
Andrew Bennetts
andrew at canonical.com
Mon Jan 14 06:30:32 GMT 2008
Robert Collins wrote:
[...]
>
> With this patch:
> real 1m38.245s
> user 0m5.996s
> sys 0m0.540s
> (it took 20 seconds to determine the revisions to pull)
>
> Note that the source repository has 1 pack, so it is pretty much optimal
> for revision index reads - the wins will be substantially more on less
> packed repositories.
Nice!
bb:tweak
> === modified file 'bzrlib/remote.py'
[...]
> + def _get_parent_map(self, keys):
> + """Helper for get_parent_map that performs the RPC."""
> + keys = set(keys)
> + if NULL_REVISION in keys:
> + keys.discard(NULL_REVISION)
> + found_parents = {NULL_REVISION:()}
> + if not keys:
> + return found_parents
> + else:
> + found_parents = {}
> + path = self.bzrdir._path_for_remote_call(self._client)
> + for key in keys:
> + assert type(key) is str
> + response = self._client.call_expecting_body(
> + 'Repository.get_parent_map', path, *keys)
> + if response[0][0] not in ['ok']:
> + raise errors.UnexpectedSmartServerResponse(response[0])
> + if response[0][0] == 'ok':
> + coded = response[1].read_body_bytes()
> + if coded == '':
> + # no revisions found
> + return {}
> + lines = coded.split('\n')
> + revision_graph = {}
> + for line in lines:
> + d = tuple(line.split())
> + if len(d) > 1:
> + revision_graph[d[0]] = d[1:]
> + else:
> + # No parents - so give the Graph result (NULL_REVISION,).
> + revision_graph[d[0]] = (NULL_REVISION,)
> + return revision_graph
There's no good infrastructure for this at the moment, but you need to handle
talking to older servers that won't recognise this verb.
For now, the way to do it is:
verb = 'Repository.get_parent_map'
if (response == ('error', "Generic bzr smart protocol error: "
"bad request '%s'" % verb) or
response == ('error', "Generic bzr smart protocol error: "
"bad request u'%s'" % verb)):
response.cancel_read_body()
# Fallback code goes here
For that matter, I think you need a cancel_read_body() in the not-ok
case too.
> === modified file 'bzrlib/smart/repository.py'
[...]
> +class SmartServerRepositoryGetParentMap(SmartServerRepositoryRequest):
> +
> + def do_repository_request(self, repository, *revision_ids):
> + repository.lock_read()
> + try:
> + return self._do_repository_request(repository, revision_ids)
> + finally:
> + repository.unlock()
Hmm, it's about time I make a SmartServerLockedRepositoryRequest, like
SmartServerLockedBranchRequest. Not something you should worry about in this
patch though.
> + def _do_repository_request(self, repository, revision_ids):
> + """Get parent details for some revisions.
> +
> + All the parents for revision_ids are returned. Additionally up to 64KB
> + of additional parent data found by performing a breadth first search
> + from revision_ids is returned.
> +
> + :param repository: The repository to query in.
> + :param revision_id:s The utf8 encoded revision_id to answer.
There's a typo in “:param revision_id:s”.
> + :return: A smart server response where the body contains an utf8
> + encoded flattened list of the revision graph, (the same format as
> + Repository.get_revision_graph).
> + """
> + lines = []
> + repo_graph = repository.get_graph()
> + result = {}
> + queried_revs = set()
> + size_so_far = 0
> + next_revs = revision_ids
> + first_loop_done = False
> + while next_revs:
> + queried_revs.update(next_revs)
> + parent_map = repo_graph.get_parent_map(next_revs)
> + next_revs = set()
> + for revision_id, parents in parent_map.iteritems():
> + # adjust for the wire
> + if parents == (_mod_revision.NULL_REVISION,):
> + parents = ()
> + # add parents to the result
> + result[revision_id] = parents
> + # prepare the next query
> + next_revs.update(parents)
> + size_so_far += 2 + len(revision_id) + sum(map(len, parents))
The arithmetic here deserves a comment. Is the 2 there to compensate for the
“ ” and “\n” delimiters? Is that right when there's more than one parent for a
revision?
> + # get all the directly asked for parents, and then flesh out to
> + # 64K or so.
> + if first_loop_done and size_so_far > 65000:
> + next_revs = set()
> + break
> + # don't query things we've already queried
> + next_revs.difference_update(queried_revs)
> + first_loop_done = True
I think there's a minor bug here: if after the first time through the while loop
size_so_far > 65000, then it will still add a little bit more to result, even
though both first_loop_done and size_so_far are true. One fix for this would be
to put:
if size_so_far > 65000:
break
at the very end of the loop.
> === modified file 'bzrlib/tests/test_remote.py'
[...]
> +class TestRepositoryGetGraph(TestRemoteRepository):
> +
> + def test_get_graph(self):
> + # get_graph returns a graph with get_parent_map as the repository.
> + responses = []
> + transport_path = 'quack'
> + repo, client = self.setup_fake_client_and_repository(
> + responses, transport_path)
> + graph = repo.get_graph()
> + self.assertEqual(graph._parents_provider, repo)
The code in the test method doesn't seem to verify the statement in the comment
describing the test.
-Andrew.
More information about the bazaar
mailing list