Rev 2817: (robertc) Make expensive content checks during knit adds optional and disable them for commit. (Robert Collins) in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Wed Sep 12 23:26:30 BST 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 2817
revision-id: pqm at pqm.ubuntu.com-20070912222627-zvqit350mf6gvrbh
parent: pqm at pqm.ubuntu.com-20070912075616-4lsus30tfc3sgwy0
parent: robertc at robertcollins.net-20070912212742-xlzu3nasjo31d5zs
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2007-09-12 23:26:27 +0100
message:
(robertc) Make expensive content checks during knit adds optional and disable them for commit. (Robert Collins)
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/reconcile.py reweave_inventory.py-20051108164726-1e5e0934febac06e
bzrlib/repofmt/knitrepo.py knitrepo.py-20070206081537-pyy4a00xdas0j4pf-1
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
bzrlib/tests/repository_implementations/test_commit_builder.py test_commit_builder.py-20060606110838-76e3ra5slucqus81-1
bzrlib/versionedfile.py versionedfile.py-20060222045106-5039c71ee3b65490
bzrlib/weave.py knit.py-20050627021749-759c29984154256b
------------------------------------------------------------
revno: 2805.6.8
merged: robertc at robertcollins.net-20070912212742-xlzu3nasjo31d5zs
parent: robertc at robertcollins.net-20070912042151-o2k78pnf1hdwd2xt
parent: pqm at pqm.ubuntu.com-20070912075616-4lsus30tfc3sgwy0
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Thu 2007-09-13 07:27:42 +1000
message:
Merge 0.92 opening.
------------------------------------------------------------
revno: 2805.6.7
merged: robertc at robertcollins.net-20070912042151-o2k78pnf1hdwd2xt
parent: robertc at robertcollins.net-20070911041823-pn9kn8l5zmmptnsr
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Wed 2007-09-12 14:21:51 +1000
message:
Review feedback.
------------------------------------------------------------
revno: 2805.6.6
merged: robertc at robertcollins.net-20070911041823-pn9kn8l5zmmptnsr
parent: robertc at robertcollins.net-20070911040511-g0wxzi6iyr3u379q
parent: pqm at pqm.ubuntu.com-20070911020027-bmt9h0jgy3zdlge3
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Tue 2007-09-11 14:18:23 +1000
message:
Merge bzr.dev.
------------------------------------------------------------
revno: 2805.6.5
merged: robertc at robertcollins.net-20070911040511-g0wxzi6iyr3u379q
parent: robertc at robertcollins.net-20070910035304-iwf9nmqoxeshjaki
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Tue 2007-09-11 14:05:11 +1000
message:
Unbreak weaves.
------------------------------------------------------------
revno: 2805.6.4
merged: robertc at robertcollins.net-20070910035304-iwf9nmqoxeshjaki
parent: robertc at robertcollins.net-20070910033719-nqsmg72617f0tvyo
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Mon 2007-09-10 13:53:04 +1000
message:
Don't check for existing versions when adding texts with random revision ids.
------------------------------------------------------------
revno: 2805.6.3
merged: robertc at robertcollins.net-20070910033719-nqsmg72617f0tvyo
parent: robertc at robertcollins.net-20070910025423-qeqnv361y8yukjzc
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Mon 2007-09-10 13:37:19 +1000
message:
* The ``VersionedFile`` interface no longer protects against misuse when
lines that are not lines, or are not strings are supplied. This saves
nearly 30% of the minimum cost to store a version of a file.
(Robert Collins)
------------------------------------------------------------
revno: 2805.6.2
merged: robertc at robertcollins.net-20070910025423-qeqnv361y8yukjzc
parent: robertc at robertcollins.net-20070910012710-bubnul8jp0mr5lhk
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Mon 2007-09-10 12:54:23 +1000
message:
General cleanup of KnitVersionedFile._add.
------------------------------------------------------------
revno: 2805.6.1
merged: robertc at robertcollins.net-20070910012710-bubnul8jp0mr5lhk
parent: pqm at pqm.ubuntu.com-20070907145828-hjh5941jv7y8d9z8
committer: Robert Collins <robertc at robertcollins.net>
branch nick: knits
timestamp: Mon 2007-09-10 11:27:10 +1000
message:
Set random_revid on CommitBuilder when a commit generated its own revision id.
=== modified file 'NEWS'
--- a/NEWS 2007-09-12 07:56:16 +0000
+++ b/NEWS 2007-09-12 21:27:42 +0000
@@ -18,6 +18,10 @@
BUG FIXES:
API BREAKS:
+
+ * The ``VersionedFile`` interface now allows content checks to be bypassed
+ by supplying check_content=False. This saves nearly 30% of the minimum
+ cost to store a version of a file. (Robert Collins)
TESTING:
=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py 2007-09-06 03:41:24 +0000
+++ b/bzrlib/knit.py 2007-09-12 04:21:51 +0000
@@ -831,37 +831,34 @@
self._index.check_versions_present(version_ids)
def _add_lines_with_ghosts(self, version_id, parents, lines, parent_texts,
- nostore_sha):
+ nostore_sha, random_id, check_content):
"""See VersionedFile.add_lines_with_ghosts()."""
- self._check_add(version_id, lines)
- return self._add(version_id, lines[:], parents, self.delta,
+ self._check_add(version_id, lines, random_id, check_content)
+ return self._add(version_id, lines, parents, self.delta,
parent_texts, None, nostore_sha)
def _add_lines(self, version_id, parents, lines, parent_texts,
- left_matching_blocks, nostore_sha):
+ left_matching_blocks, nostore_sha, random_id, check_content):
"""See VersionedFile.add_lines."""
- self._check_add(version_id, lines)
+ self._check_add(version_id, lines, random_id, check_content)
self._check_versions_present(parents)
return self._add(version_id, lines[:], parents, self.delta,
parent_texts, left_matching_blocks, nostore_sha)
- def _check_add(self, version_id, lines):
+ def _check_add(self, version_id, lines, random_id, check_content):
"""check that version_id and lines are safe to add."""
- assert self.writable, "knit is not opened for write"
- ### FIXME escape. RBC 20060228
if contains_whitespace(version_id):
raise InvalidRevisionId(version_id, self.filename)
self.check_not_reserved_id(version_id)
- # Technically this is a case of Look Before You Leap, but:
- # - for knits this saves wasted space in the error case
- # - for packs this avoids dead space in the pack
- # - it also avoids undetected poisoning attacks.
- # - its 1.5% of total commit time, so ignore it unless it becomes a
- # larger percentage.
- if self.has_version(version_id):
+ # Technically this could be avoided if we are happy to allow duplicate
+ # id insertion when other things than bzr core insert texts, but it
+ # seems useful for folk using the knit api directly to have some safety
+ # blanket that we can disable.
+ if not random_id and self.has_version(version_id):
raise RevisionAlreadyPresent(version_id, self.filename)
- self._check_lines_not_unicode(lines)
- self._check_lines_are_lines(lines)
+ if check_content:
+ self._check_lines_not_unicode(lines)
+ self._check_lines_are_lines(lines)
def _add(self, version_id, lines, parents, delta, parent_texts,
left_matching_blocks, nostore_sha):
@@ -885,16 +882,16 @@
# +61 0 1918.1800 5.2640 +bzrlib.knit:359(_merge_annotations)
present_parents = []
- ghosts = []
if parent_texts is None:
parent_texts = {}
for parent in parents:
- if not self.has_version(parent):
- ghosts.append(parent)
- else:
+ if self.has_version(parent):
present_parents.append(parent)
- if delta and not len(present_parents):
+ # can only compress against the left most present parent.
+ if (delta and
+ (len(present_parents) == 0 or
+ present_parents[0] != parents[0])):
delta = False
digest = sha_strings(lines)
@@ -904,10 +901,12 @@
options = []
if lines:
if lines[-1][-1] != '\n':
+ # copy the contents of lines.
+ lines = lines[:]
options.append('no-eol')
lines[-1] = lines[-1] + '\n'
- if len(present_parents) and delta:
+ if delta:
# To speed the extract of texts the delta chain is limited
# to a fixed number of deltas. This should minimize both
# I/O and the time spend applying deltas.
@@ -916,7 +915,7 @@
assert isinstance(version_id, str)
content = self.factory.make(lines, version_id)
if delta or (self.factory.annotated and len(present_parents) > 0):
- # Merge annotations from parent texts if so is needed.
+ # Merge annotations from parent texts if needed.
delta_hunks = self._merge_annotations(content, present_parents,
parent_texts, delta, self.factory.annotated,
left_matching_blocks)
=== modified file 'bzrlib/reconcile.py'
--- a/bzrlib/reconcile.py 2007-09-05 03:51:59 +0000
+++ b/bzrlib/reconcile.py 2007-09-12 04:21:51 +0000
@@ -171,7 +171,7 @@
# reconcile.
new_inventory_vf._check_write_ok()
Weave._add_lines(new_inventory_vf, rev_id, parents,
- self.inventory.get_lines(rev_id), None, None, None)
+ self.inventory.get_lines(rev_id), None, None, None, False, True)
else:
new_inventory_vf.add_lines(rev_id, parents, self.inventory.get_lines(rev_id))
=== modified file 'bzrlib/repofmt/knitrepo.py'
--- a/bzrlib/repofmt/knitrepo.py 2007-08-30 08:27:29 +0000
+++ b/bzrlib/repofmt/knitrepo.py 2007-09-12 04:21:51 +0000
@@ -74,8 +74,9 @@
# This class isn't deprecated
pass
- def _inventory_add_lines(self, inv_vf, revid, parents, lines):
- inv_vf.add_lines_with_ghosts(revid, parents, lines)
+ def _inventory_add_lines(self, inv_vf, revid, parents, lines, check_content):
+ inv_vf.add_lines_with_ghosts(revid, parents, lines,
+ check_content=check_content)
@needs_read_lock
def _all_revision_ids(self):
=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py 2007-09-12 00:05:48 +0000
+++ b/bzrlib/repository.py 2007-09-12 21:27:42 +0000
@@ -125,16 +125,17 @@
inv_vf = self.control_weaves.get_weave('inventory',
self.get_transaction())
self._inventory_add_lines(inv_vf, revision_id, parents,
- osutils.split_lines(inv_text))
+ osutils.split_lines(inv_text), check_content=False)
return inv_sha1
- def _inventory_add_lines(self, inv_vf, revision_id, parents, lines):
+ def _inventory_add_lines(self, inv_vf, revision_id, parents, lines,
+ check_content=True):
final_parents = []
for parent in parents:
if parent in inv_vf:
final_parents.append(parent)
-
- inv_vf.add_lines(revision_id, final_parents, lines)
+ inv_vf.add_lines(revision_id, final_parents, lines,
+ check_content=check_content)
@needs_write_lock
def add_revision(self, revision_id, rev, inv=None, config=None):
@@ -2190,6 +2191,9 @@
"""
if self._new_revision_id is None:
self._new_revision_id = self._gen_revision_id()
+ self.random_revid = True
+ else:
+ self.random_revid = False
def _check_root(self, ie, parent_invs, tree):
"""Helper for record_entry_contents.
@@ -2305,8 +2309,14 @@
# Don't change this to add_lines - add_lines_with_ghosts is cheaper
# than add_lines, and allows committing when a parent is ghosted for
# some reason.
+ # Note: as we read the content directly from the tree, we know its not
+ # been turned into unicode or badly split - but a broken tree
+ # implementation could give us bad output from readlines() so this is
+ # not a guarantee of safety. What would be better is always checking
+ # the content during test suite execution. RBC 20070912
result = versionedfile.add_lines_with_ghosts(
- self._new_revision_id, parents, new_lines)[0:2]
+ self._new_revision_id, parents, new_lines,
+ random_id=self.random_revid, check_content=False)[0:2]
versionedfile.clear_cache()
return result
=== modified file 'bzrlib/tests/repository_implementations/test_commit_builder.py'
--- a/bzrlib/tests/repository_implementations/test_commit_builder.py 2007-09-04 03:53:07 +0000
+++ b/bzrlib/tests/repository_implementations/test_commit_builder.py 2007-09-10 01:27:10 +0000
@@ -34,6 +34,7 @@
builder = branch.repository.get_commit_builder(
branch, [], branch.get_config())
self.assertIsInstance(builder, CommitBuilder)
+ self.assertTrue(builder.random_revid)
branch.repository.commit_write_group()
branch.repository.unlock()
@@ -101,6 +102,7 @@
except CannotSetRevisionId:
# This format doesn't support supplied revision ids
return
+ self.assertFalse(builder.random_revid)
self.record_root(builder, tree)
builder.finish_inventory()
self.assertEqual(revision_id, builder.commit('foo bar'))
=== modified file 'bzrlib/versionedfile.py'
--- a/bzrlib/versionedfile.py 2007-09-05 22:25:01 +0000
+++ b/bzrlib/versionedfile.py 2007-09-12 04:21:51 +0000
@@ -78,7 +78,8 @@
raise NotImplementedError(self.has_version)
def add_lines(self, version_id, parents, lines, parent_texts=None,
- left_matching_blocks=None, nostore_sha=None):
+ left_matching_blocks=None, nostore_sha=None, random_id=False,
+ check_content=True):
"""Add a single text on top of the versioned file.
Must raise RevisionAlreadyPresent if the new version is
@@ -86,16 +87,32 @@
Must raise RevisionNotPresent if any of the given parents are
not present in file history.
+
+ :param lines: A list of lines. Each line must be a bytestring. And all
+ of them except the last must be terminated with \n and contain no
+ other \n's. The last line may either contain no \n's or a single
+ terminated \n. If the lines list does meet this constraint the add
+ routine may error or may succeed - but you will be unable to read
+ the data back accurately. (Checking the lines have been split
+ correctly is expensive and extremely unlikely to catch bugs so it
+ is not done at runtime unless check_content is True.)
:param parent_texts: An optional dictionary containing the opaque
- representations of some or all of the parents of
- version_id to allow delta optimisations.
- VERY IMPORTANT: the texts must be those returned
- by add_lines or data corruption can be caused.
+ representations of some or all of the parents of version_id to
+ allow delta optimisations. VERY IMPORTANT: the texts must be those
+ returned by add_lines or data corruption can be caused.
:param left_matching_blocks: a hint about which areas are common
between the text and its left-hand-parent. The format is
the SequenceMatcher.get_matching_blocks format.
:param nostore_sha: Raise ExistingContent and do not add the lines to
the versioned file if the digest of the lines matches this.
+ :param random_id: If True a random id has been selected rather than
+ an id determined by some deterministic process such as a converter
+ from a foreign VCS. When True the backend may choose not to check
+ for uniqueness of the resulting key within the versioned file, so
+ this should only be done when the result is expected to be unique
+ anyway.
+ :param check_content: If True, the lines supplied are verified to be
+ bytestrings that are correctly formed lines.
:return: The text sha1, the number of bytes in the text, and an opaque
representation of the inserted version which can be provided
back to future add_lines calls in the parent_texts dictionary.
@@ -104,15 +121,16 @@
parents = [osutils.safe_revision_id(v) for v in parents]
self._check_write_ok()
return self._add_lines(version_id, parents, lines, parent_texts,
- left_matching_blocks, nostore_sha)
+ left_matching_blocks, nostore_sha, random_id, check_content)
def _add_lines(self, version_id, parents, lines, parent_texts,
- left_matching_blocks, nostore_sha):
+ left_matching_blocks, nostore_sha, random_id, check_content):
"""Helper to do the class specific add_lines."""
raise NotImplementedError(self.add_lines)
def add_lines_with_ghosts(self, version_id, parents, lines,
- parent_texts=None, nostore_sha=None):
+ parent_texts=None, nostore_sha=None, random_id=False,
+ check_content=True):
"""Add lines to the versioned file, allowing ghosts to be present.
This takes the same parameters as add_lines and returns the same.
@@ -121,10 +139,10 @@
parents = [osutils.safe_revision_id(v) for v in parents]
self._check_write_ok()
return self._add_lines_with_ghosts(version_id, parents, lines,
- parent_texts, nostore_sha)
+ parent_texts, nostore_sha, random_id, check_content)
def _add_lines_with_ghosts(self, version_id, parents, lines, parent_texts,
- nostore_sha):
+ nostore_sha, random_id, check_content):
"""Helper to do class specific add_lines_with_ghosts."""
raise NotImplementedError(self.add_lines_with_ghosts)
=== modified file 'bzrlib/weave.py'
--- a/bzrlib/weave.py 2007-09-05 22:25:01 +0000
+++ b/bzrlib/weave.py 2007-09-12 04:21:51 +0000
@@ -261,7 +261,7 @@
return idx
def _add_lines(self, version_id, parents, lines, parent_texts,
- left_matching_blocks, nostore_sha):
+ left_matching_blocks, nostore_sha, random_id, check_content):
"""See VersionedFile.add_lines."""
idx = self._add(version_id, lines, map(self._lookup, parents),
nostore_sha=nostore_sha)
@@ -870,11 +870,12 @@
self._save()
def _add_lines(self, version_id, parents, lines, parent_texts,
- left_matching_blocks, nostore_sha):
+ left_matching_blocks, nostore_sha, random_id, check_content):
"""Add a version and save the weave."""
self.check_not_reserved_id(version_id)
result = super(WeaveFile, self)._add_lines(version_id, parents, lines,
- parent_texts, left_matching_blocks, nostore_sha)
+ parent_texts, left_matching_blocks, nostore_sha, random_id,
+ check_content)
self._save()
return result
More information about the bazaar-commits
mailing list