Rev 2795: Change versionedfile.add_lines again to include the key of the added text in the return value. in http://people.ubuntu.com/~robertc/baz2.0/knits
Robert Collins
robertc at robertcollins.net
Wed Sep 5 00:41:10 BST 2007
At http://people.ubuntu.com/~robertc/baz2.0/knits
------------------------------------------------------------
revno: 2795
revision-id: robertc at robertcollins.net-20070904234057-s1v9l9q1w5jt25co
parent: pqm at pqm.ubuntu.com-20070904035759-iv4xl6d7ez69txba
parent: robertc at robertcollins.net-20070816083953-sbfb70vw6tmh3vak
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Wed 2007-09-05 09:40:57 +1000
message:
Change versionedfile.add_lines again to include the key of the added text in the return value.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/fetch.py fetch.py-20050818234941-26fea6105696365d
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
bzrlib/tests/test_versionedfile.py test_versionedfile.py-20060222045249-db45c9ed14a1c2e5
bzrlib/versionedfile.py versionedfile.py-20060222045106-5039c71ee3b65490
bzrlib/weave.py knit.py-20050627021749-759c29984154256b
------------------------------------------------------------
revno: 2698.2.5
revision-id: robertc at robertcollins.net-20070816083953-sbfb70vw6tmh3vak
parent: robertc at robertcollins.net-20070816081927-rhroje8susrd3a40
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Thu 2007-08-16 18:39:53 +1000
message:
Decouple parsing and iterating the lines in knit records from getting the data, making it suitable for use in pack repositories.
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
------------------------------------------------------------
revno: 2698.2.4
revision-id: robertc at robertcollins.net-20070816081927-rhroje8susrd3a40
parent: robertc at robertcollins.net-20070816081549-dpowek5gwvox1x56
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Thu 2007-08-16 18:19:27 +1000
message:
Remove full history scan during iter_lines_added_or_present in KnitVersionedFile.
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
------------------------------------------------------------
revno: 2698.2.3
revision-id: robertc at robertcollins.net-20070816081549-dpowek5gwvox1x56
parent: robertc at robertcollins.net-20070816081414-ps82io1cs4cij6vz
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Thu 2007-08-16 18:15:49 +1000
message:
Remove an unneeded pre-check in KnitVersionedFile.iter_lines_added_or_present_in_version.
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
------------------------------------------------------------
revno: 2698.2.2
revision-id: robertc at robertcollins.net-20070816081414-ps82io1cs4cij6vz
parent: robertc at robertcollins.net-20070816080814-49b7t5gghdrhcx4d
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Thu 2007-08-16 18:14:14 +1000
message:
Use the low-level sorting facility in KnitVersionedFile.iter_records
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
------------------------------------------------------------
revno: 2698.2.1
revision-id: robertc at robertcollins.net-20070816080814-49b7t5gghdrhcx4d
parent: pqm at pqm.ubuntu.com-20070814221506-6rw0b0oolfdeqrdw
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Thu 2007-08-16 18:08:14 +1000
message:
Add new get_raw_records_unsorted method on knit access, to allow low level sorting and optimisation when the upper layer does not need results in a particular order.
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/tests/test_knit.py test_knit.py-20051212171302-95d4c00dd5f11f2b
=== modified file 'NEWS'
--- a/NEWS 2007-09-04 01:20:26 +0000
+++ b/NEWS 2007-09-04 23:40:57 +0000
@@ -165,8 +165,8 @@
``Branch.set_last_revision_info`` instead. (Martin Pool)
* The ``add_lines`` methods on ``VersionedFile`` implementations has changed
- its return value to include the sha1 and length of the inserted text. This
- allows the avoidance of double-sha1 calculations during commit.
+ its return value to include the sha1, length and key for the inserted
+ text. This allows the avoidance of double-sha1 calculations during commit.
(Robert Collins)
* ``Transport.should_cache`` has been removed. It was not called in the
=== modified file 'bzrlib/fetch.py'
--- a/bzrlib/fetch.py 2007-09-03 02:58:58 +0000
+++ b/bzrlib/fetch.py 2007-09-04 23:40:57 +0000
@@ -349,9 +349,9 @@
root_id = tree.inventory.root.file_id
parents = inventory_weave.get_parents(revision_id)
if root_id not in versionedfile:
- versionedfile[root_id] = to_store.get_weave_or_empty(root_id,
+ versionedfile[root_id] = to_store.get_weave_or_empty(root_id,
self.target.get_transaction())
- _, _, parent_texts[root_id] = versionedfile[root_id].add_lines(
+ _, _, _, parent_texts[root_id] = versionedfile[root_id].add_lines(
revision_id, parents, [], parent_texts)
def regenerate_inventory(self, revs):
=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py 2007-09-03 21:19:07 +0000
+++ b/bzrlib/knit.py 2007-09-04 23:40:57 +0000
@@ -945,7 +945,7 @@
access_memo = self._data.add_record(version_id, digest, store_lines)
self._index.add_version(version_id, options, access_memo, parents)
- return digest, text_length, lines
+ return digest, text_length, version_id, lines
def check(self, progress_bar=None):
"""See VersionedFile.check()."""
@@ -1061,38 +1061,22 @@
version_ids = self.versions()
else:
version_ids = [osutils.safe_revision_id(v) for v in version_ids]
- if pb is None:
- pb = progress.DummyProgress()
# we don't care about inclusions, the caller cares.
# but we need to setup a list of records to visit.
# we need version_id, position, length
version_id_records = []
requested_versions = set(version_ids)
- # filter for available versions
+ methods = {}
+ # create set of records to read:
for version_id in requested_versions:
- if not self.has_version(version_id):
- raise RevisionNotPresent(version_id, self.filename)
- # get a in-component-order queue:
- for version_id in self.versions():
- if version_id in requested_versions:
- index_memo = self._index.get_position(version_id)
- version_id_records.append((version_id, index_memo))
-
+ index_memo = self._index.get_position(version_id)
+ method = self._index.get_method(version_id)
+ version_id_records.append((version_id, index_memo))
+ methods[version_id] = method
total = len(version_id_records)
- for version_idx, (version_id, data, sha_value) in \
- enumerate(self._data.read_records_iter(version_id_records)):
- pb.update('Walking content.', version_idx, total)
- method = self._index.get_method(version_id)
-
- assert method in ('fulltext', 'line-delta')
- if method == 'fulltext':
- line_iterator = self.factory.get_fulltext_content(data)
- else:
- line_iterator = self.factory.get_linedelta_content(data)
- for line in line_iterator:
- yield line
-
- pb.update('Walking content.', total, total)
+ return self._data.iter_lines_added_or_present_in_records(
+ self._data.read_records_iter(version_id_records),
+ methods, self.factory, pb, total)
def iter_parents(self, version_ids):
"""Iterate through the parents for many version ids.
@@ -1794,7 +1778,25 @@
return set((version_id, ) for version_id in version_ids)
-class _KnitAccess(object):
+class _Access(object):
+ """Base class with common logic for accessing knit record data."""
+
+ def get_raw_records_unsorted(self, memos_for_retrieval):
+ """Get the raw bytes for many records with 'best' IO.
+
+ :param memos_for_retrieval: An iterable containing the memo's to
+ use when retrieving the bytes. The Pack access method looks up the
+ pack to use for a given record in its index_to_pack map.
+ :return: An iterator over (memos, bytes) for all the requested
+ records.
+ """
+ # sort the memos
+ sorted_memos = sorted(memos_for_retrieval)
+ # delegate to the concrete class to get the now sorted records.
+ return izip(sorted_memos, self.get_raw_records(sorted_memos))
+
+
+class _KnitAccess(_Access):
"""Access to knit records in a .knit file."""
def __init__(self, transport, filename, _file_mode, _dir_mode,
@@ -1874,7 +1876,7 @@
yield data
-class _PackAccess(object):
+class _PackAccess(_Access):
"""Access to knit records via a collection of packs."""
def __init__(self, index_to_packs, writer=None):
@@ -1995,6 +1997,36 @@
def _open_file(self):
return self._access.open_file()
+ def iter_lines_added_or_present_in_records(self, record_iterator, methods,
+ record_parser, pb=None, record_count=0):
+ """Read, parse and yield the contents of records as lines.
+
+ :param record_iterator: An iterable of version_id, data, sha_value for
+ the records to process.
+ :param methods: A dict of version_id -> method.
+ :param record_parser: A knit record parser which can parse each
+ record.
+ :param pb: A progress bar, or None.
+ :param record_count: A total for the progress bar if one is supplied.
+ :return: An iterator over all the lines in no particular order.
+ """
+ if pb is None:
+ pb = progress.DummyProgress()
+ for version_idx, (version_id, data, sha_value) in \
+ enumerate(record_iterator):
+ pb.update('Walking content.', version_idx, record_count)
+ method = methods[version_id]
+
+ assert method in ('fulltext', 'line-delta')
+ if method == 'fulltext':
+ line_iterator = record_parser.get_fulltext_content(data)
+ else:
+ line_iterator = record_parser.get_linedelta_content(data)
+ for line in line_iterator:
+ yield line
+
+ pb.update('Walking content.', record_count, record_count)
+
def _record_to_data(self, version_id, digest, lines):
"""Convert version_id, digest, lines into a raw data block.
@@ -2136,6 +2168,7 @@
The result will be returned in whatever is the fastest to read.
Not by the order requested. Also, multiple requests for the same
record will only yield 1 response.
+
:param records: A list of (version_id, pos, len) entries
:return: Yields (version_id, contents, digest) in the order
read, not the order requested
@@ -2157,20 +2190,25 @@
yield (record[0], content, digest)
else:
needed_records.add(record)
- needed_records = sorted(needed_records, key=operator.itemgetter(1))
else:
- needed_records = sorted(set(records), key=operator.itemgetter(1))
+ needed_records = records
if not needed_records:
return
+ # let the access object optimise lookups, so setup a mapping back to
+ # version_ids.
+ needed_memos = {}
+ for version_id, index_memo in needed_records:
+ needed_memos[index_memo] = version_id
+
# The transport optimizes the fetching as well
# (ie, reads continuous ranges.)
- raw_data = self._access.get_raw_records(
- [index_memo for version_id, index_memo in needed_records])
+ raw_results = self._access.get_raw_records_unsorted(
+ needed_memos.iterkeys())
- for (version_id, index_memo), data in \
- izip(iter(needed_records), raw_data):
+ for index_memo, data in raw_results:
+ version_id = needed_memos[index_memo]
content, digest = self._parse_record(version_id, data)
if self._do_cache:
self._cache[version_id] = data
=== modified file 'bzrlib/tests/test_knit.py'
--- a/bzrlib/tests/test_knit.py 2007-08-30 08:27:29 +0000
+++ b/bzrlib/tests/test_knit.py 2007-09-04 23:40:57 +0000
@@ -188,6 +188,16 @@
access.create()
self.assertAccessExists(access)
+ def test_get_raw_records_unsorted(self):
+ """get_raw_records_unsorted returns in best-read order."""
+ access = self.get_access()
+ memos = access.add_raw_records([10, 2, 5], '12345678901234567')
+ expected_result = zip(memos, ['1234567890', '12', '34567'])
+ self.assertEqual(expected_result,
+ list(access.get_raw_records_unsorted(memos)))
+ self.assertEqual(expected_result,
+ list(access.get_raw_records_unsorted(reversed(memos))))
+
def test_open_file(self):
"""open_file never errors."""
access = self.get_access()
=== modified file 'bzrlib/tests/test_versionedfile.py'
--- a/bzrlib/tests/test_versionedfile.py 2007-09-03 21:19:07 +0000
+++ b/bzrlib/tests/test_versionedfile.py 2007-09-04 23:40:57 +0000
@@ -84,13 +84,13 @@
def test_adds_with_parent_texts(self):
f = self.get_file()
parent_texts = {}
- _, _, parent_texts['r0'] = f.add_lines('r0', [], ['a\n', 'b\n'])
+ _, _, _, parent_texts['r0'] = f.add_lines('r0', [], ['a\n', 'b\n'])
try:
- _, _, parent_texts['r1'] = f.add_lines_with_ghosts('r1',
+ _, _, _, parent_texts['r1'] = f.add_lines_with_ghosts('r1',
['r0', 'ghost'], ['b\n', 'c\n'], parent_texts=parent_texts)
except NotImplementedError:
# if the format doesn't support ghosts, just add normally.
- _, _, parent_texts['r1'] = f.add_lines('r1',
+ _, _, _, parent_texts['r1'] = f.add_lines('r1',
['r0'], ['b\n', 'c\n'], parent_texts=parent_texts)
f.add_lines('r2', ['r1'], ['c\n', 'd\n'], parent_texts=parent_texts)
self.assertNotEqual(None, parent_texts['r0'])
@@ -168,7 +168,8 @@
vf.add_delta, 'a:', [], None, 'sha1', False, ((0, 0, 0, []),))
def test_add_lines_return_value(self):
- # add_lines should return the sha1 and the text size.
+ # add_lines should return:
+ # the sha1, text size, key, opaque-parent-representation
vf = self.get_file()
empty_text = ('a', [])
sample_text_nl = ('b', ["foo\n", "bar\n"])
@@ -178,14 +179,18 @@
# the first two elements are the same for all versioned files:
# - the digest and the size of the text. For some versioned files
# additional data is returned in additional tuple elements.
+ # then comes the key. Currently all VF implementations use user
+ # supplied keys, so we just cross-reference back to version.
result = vf.add_lines(version, [], lines)
- self.assertEqual(3, len(result))
- self.assertEqual((osutils.sha_strings(lines), sum(map(len, lines))),
- result[0:2])
+ self.assertEqual(4, len(result))
+ self.assertEqual(
+ (osutils.sha_strings(lines), sum(map(len, lines)), version),
+ result[0:3])
# parents should not affect the result:
lines = sample_text_nl[1]
- self.assertEqual((osutils.sha_strings(lines), sum(map(len, lines))),
- vf.add_lines('d', ['b', 'c'], lines)[0:2])
+ self.assertEqual(
+ (osutils.sha_strings(lines), sum(map(len, lines)), 'd'),
+ vf.add_lines('d', ['b', 'c'], lines)[0:3])
def test_get_reserved(self):
vf = self.get_file()
=== modified file 'bzrlib/versionedfile.py'
--- a/bzrlib/versionedfile.py 2007-09-03 20:23:05 +0000
+++ b/bzrlib/versionedfile.py 2007-09-04 23:40:57 +0000
@@ -130,9 +130,10 @@
:param left_matching_blocks: a hint about which areas are common
between the text and its left-hand-parent. The format is
the SequenceMatcher.get_matching_blocks format.
- :return: The text sha1, the number of bytes in the text, and an opaque
- representation of the inserted version which can be provided
- back to future add_lines calls in the parent_texts dictionary.
+ :return: The text sha1, the number of bytes in the text, the key to
+ obtain the lines back again and an opaque representation of the
+ inserted version which can be provided back to future add_lines
+ calls in the parent_texts dictionary.
"""
version_id = osutils.safe_revision_id(version_id)
parents = [osutils.safe_revision_id(v) for v in parents]
@@ -318,7 +319,7 @@
mpvf.get_diff(parent_ids[0]).num_lines()))
else:
left_matching_blocks = None
- _, _, version_text = self.add_lines(version, parent_ids, lines,
+ _, _, _, version_text = self.add_lines(version, parent_ids, lines,
vf_parents, left_matching_blocks=left_matching_blocks)
vf_parents[version] = version_text
for (version, parent_ids, expected_sha1, mpdiff), sha1 in\
@@ -687,7 +688,7 @@
# deltas = self.source.get_deltas(order)
for index, version in enumerate(order):
pb.update('Converting versioned data', index, len(order))
- _, _, parent_text = target.add_lines(version,
+ _, _, _, parent_text = target.add_lines(version,
self.source.get_parents(version),
self.source.get_lines(version),
parent_texts=parent_texts)
=== modified file 'bzrlib/weave.py'
--- a/bzrlib/weave.py 2007-09-03 02:58:58 +0000
+++ b/bzrlib/weave.py 2007-09-04 23:40:57 +0000
@@ -420,7 +420,7 @@
left_matching_blocks=None):
"""See VersionedFile.add_lines."""
idx = self._add(version_id, lines, map(self._lookup, parents))
- return sha_strings(lines), sum(map(len, lines)), idx
+ return sha_strings(lines), sum(map(len, lines)), version_id, idx
def _add(self, version_id, lines, parents, sha1=None):
"""Add a single text on top of the weave.
More information about the bazaar-commits
mailing list