[PATCH] Changesets for bzr

Thu May 25 21:11:54 BST 2006

Aaron Bentley wrote:
> Okay, here's the latest changeset stuff, submitted for your review.
> (I'll update the branch this evening.)
> 
> Aaron

I'm not sure what it is, but Enigmail really doesn't like your message.
(It has crashed Thunderbird on 2 machines, and is generally just unhappy).

What were the incompatible changes that you wanted to add?

Anyway, I'll try a quick review:

=== added directory 'bzrlib/changeset'
=== added file 'bzrlib/changeset/__init__.py'
=== added file 'bzrlib/changeset/apply_changeset.py'

--- /dev/null	
+++ bzrlib/changeset/apply_changeset.py	
@@ -0,0 +1,55 @@
+"""\
+This contains functionality for installing changesets into repositories
+"""
+
+import bzrlib.ui
+from bzrlib.progress import ProgressPhase
+from bzrlib.merge import Merger
+from bzrlib.repository import install_revision
+
+
+def install_changeset(repository, changeset_reader):

...

+
+def merge_changeset(reader, tree, check_clean, merge_type,
+                    reprocess, show_base):

...

In the refactoring, I think we would want to change the 'reader'
parameter to be whatever the final return of Serializer.read() is. Or
maybe that in itself would do the merging/installing.

Anyway, I'm wondering if we should make this underscore prefixed, since
they aren't really going to be permanent.



=== added file 'bzrlib/changeset/commands.py'

...

+class cmd_send_changeset(Command):

send-changeset is completely borked. 'send_changeset.py' is still in
bzrlib/changeset/old.py

I'm thinking we should at least not import it into builtins.py until we
get it cleaned up. I'm 75% okay with leaving it here so that we
eventually fix it up. It might evolve to just plain 'bzr send', I don't
know for sure. But when we get it working again, we probably should hook
it up to configs, and the proposal for where to send changes to. (You
have a personal setting, and a setting where people who branch from you
should send to)

+
+class cmd_changeset(Command):

Will anyone use this as anything other than 'bzr cset'? Should we just
name the command 'cmd_cset' and not worry about the alias?

+
+class cmd_verify_changeset(Command):

verify_changeset is also broken, I get:
$ ./bzr cset | ./bzr verify-changeset
bzr: ERROR: exceptions.AttributeError: 'BzrBranch5' object has no
attribute 'has_revision'
  at
/home/jameinel/dev/bzr/w-changeset/bzrlib/changeset/read_changeset.py
line 289
  in _validate_references_from_repository

It looks like we just need to pass 'b.repository' rather than just plain
'b'.

We should probably at least have a smoke test for 'verify-changeset'.


+
+
+register_command(cmd_changeset)
+#register_command(cmd_verify_changeset)
+#register_command(cmd_send_changeset)

We probably shouldn't be using register_command here. Instead just
importing the commands into builtins.py (Otherwise they show up as
plugin commands).

+
+def test_suite():

test_suite should also be removed.



=== added file 'bzrlib/changeset/common.py'
--- /dev/null	
+++ bzrlib/changeset/common.py	


...
+
+def encode(s):
+    """Take a unicode string, and make sure to escape it for
+    use in a changeset.
+
+    Note: It can be either a normal, or a unicode string
+
+    >>> encode(u'abcdefg')
+    'abcdefg'
+    >>> encode(u'a b\\tc\\nd\\\\e')
+    'a b\\tc\\nd\\\\e'
+    >>> encode('a b\\tc\\nd\\e')
+    'a b\\tc\\nd\\\\e'
+    >>> encode(u'\\u1234\\u0020')
+    '\\xe1\\x88\\xb4 '
+    >>> encode('abcdefg')
+    'abcdefg'
+    >>> encode(u'')
+    ''
+    >>> encode('')
+    ''
+    """
+    return s.encode('utf-8')


We should get rid of both 'encode' and 'decode' functions, and just
switch to s.encode('utf8') if that is what we are using. At one point I
was escaping certain characters.

+def decode(s):


...

+    >>> t -= 24*3600*365*2 # Start 2 years ago
+    >>> o = -12*3600
+    >>> for count in xrange(500):
+    ...   t += random.random()*24*3600*30
+    ...   o = ((o/3600 + 13) % 25 - 12)*3600 # Add 1 wrap around from
[-12, 12]
+    ...   date = format_highres_date(t, o)
+    ...   t2, o2 = unpack_highres_date(date)
+    ...   if t != t2 or o != o2:
+    ...      print 'Failed on date %r, %s,%s diff:%s' % (date, t, o, t2-t)
+    ...      break
+

This loop creates a bunch of random dates, and is known to fail at
certain times (around daylight savings time). I think we have tested
format_highres_date and unpack_highres_date that we don't need a random
shotgun test anymore.


=== added file 'bzrlib/changeset/read_changeset.py'
--- /dev/null	
+++ bzrlib/changeset/read_changeset.py	


...

+def _unescape(name):

This function is unused, and needs to be removed.

+
+
+class ChangesetReader(object):

...

+
+    def _validate_inventory(self, inv, revision_id):
+        """At this point we should have generated the ChangesetTree,
+        so build up an inventory, and make sure the hashes match.
+        """

I think _validate_inventory is now bogus, since it expects an exact
inventory serialization. Which we no longer guarantee.

...

+
+class ChangesetTree(Tree):

...

+
+    def is_executable(self, file_id):
+        path = self.id2path(file_id)
+        if path in self._executable:
+            return self._executable[path]
+        else:
+            return self.base_tree.inventory[file_id].executable

I still think that if the file is missing in base, then its executable
bit is False. It doesn't have to do with a particular serialization
format. Just the idea that a file without a base has executable=False
rather than executable=None (unknown). I realize I am wanting to use
that fact in the current format. But what happens in the future when you
have a new format, and you have an add. It means a 'note_add' *always*
has to be accompanied by a 'note_executable', which seems unnecessary.
If we really want that, then 'note_add' should have an
'executable=False' parameter instead.


...

+
+def binary_diff(old_filename, old_lines, new_filename, new_lines, to_file):
+    temp = StringIO()
+    internal_diff(old_filename, old_lines, new_filename, new_lines, temp,
+                  allow_binary=True)
+    temp.seek(0)
+    base64.encode(temp, to_file)
+    to_file.write('\n')

Am I reading this correctly that you actually do compute the diff, and
just base-64 encode the result? Rather than sending the whole binary?
Sounds nice. Though only really useful if we have a good binary diff
algorithm.

...


=== added file 'bzrlib/patches.py'
--- /dev/null	
+++ bzrlib/patches.py	

...

Shouldn't these exceptions be in errors.py?

+
+class PatchSyntax(Exception):
+    def __init__(self, msg):
+        Exception.__init__(self, msg)
+
+
+class MalformedPatchHeader(PatchSyntax):
+    def __init__(self, desc, line):
+        self.desc = desc
+        self.line = line
+        msg = "Malformed patch header.  %s\n%r" % (self.desc, self.line)
+        PatchSyntax.__init__(self, msg)
+
+
+class MalformedHunkHeader(PatchSyntax):
+    def __init__(self, desc, line):
+        self.desc = desc
+        self.line = line
+        msg = "Malformed hunk header.  %s\n%r" % (self.desc, self.line)
+        PatchSyntax.__init__(self, msg)
+
+
+class MalformedLine(PatchSyntax):
+    def __init__(self, desc, line):
+        self.desc = desc
+        self.line = line
+        msg = "Malformed line.  %s\n%s" % (self.desc, self.line)
+        PatchSyntax.__init__(self, msg)
+

...

There is another one near the end. At the very least, this should be
moved up, but probably needs to be moved into errors.py.

+
+
+class PatchConflict(Exception):
+    def __init__(self, line_no, orig_line, patch_line):
+        orig = orig_line.rstrip('\n')
+        patch = str(patch_line).rstrip('\n')
+        msg = 'Text contents mismatch at line %d.  Original has "%s",'\
+            ' but patch says it should be "%s"' % (line_no, orig, patch)
+        Exception.__init__(self, msg)
+


...

Imports at the end of the file? Doesn't seem very nice. But the TestCase
should also be moved into bzrlib/tests


+
+import unittest
+import os.path
+
+
+class PatchesTester(unittest.TestCase):

...

Should we remove this arch-tag?

+# arch-tag: d1541a25-eac5-4de9-a476-08a7cecd5683

=== added file 'bzrlib/tests/test_patches.py'

...

+
+# Just import the tester built-into the patch file
+
+from bzrlib.patches import PatchesTester
+from bzrlib.osutils import pathjoin
+import os
+
+
+# We have to inherit so that unittest will consider it
+# Also, the testdata directory is relative to this file
+# so override datafile
+class TestPatches(PatchesTester):
+
+    def datafile(self, filename):
+        data_path = os.path.join(os.path.dirname(__file__),
+                            "test_patches_data", filename)
+        return file(data_path, "rb")
+
+# we have to delete the original, or it will also be tested
+del PatchesTester


This was the hack to get 'PatchesTester' into the selftest. But probably
we should just move it.


=== added directory 'bzrlib/tests/test_patches_data'

=== added file 'bzrlib/tests/test_patches_data/mod'

We have the same problem where our test case uses python source
material. I'm okay with it, but it might be better to try and find other
material in the future. Also, the test files are rather long. In fact,
of your total patch, they seem to make up the bulk.

=== modified file 'BRANCH.TODO'
--- BRANCH.TODO	
+++ BRANCH.TODO	
@@ -1,3 +1,11 @@
 # This file is for listing TODOs for branches that are being worked on.
 # It should ALWAYS be empty in the mainline or in integration branches.
-#
+#
+
+Ensure changeset command behaves properly w/revisions (e.g. 0)
+Decide whether to bundle 'send-changeset' into 'changeset'.
+Decide whether to fix or kill 'verify-changeset'.
+Figure out whether testament-related bug is my problem
+Get rid of the 'old' directory
+Bundle signatures?
+What to do about ghost ancestors in remote branch that are present locally?


I know that we still have TODOs in this code. Is it okay to merge them?

...

I thought with the inventory fixes we might either have no executable
attribute, or it would at least be None for non-files.

+
+class StrictTestament(Testament):
+    """This testament format is for use as a checksum in changesets"""
+
+    def _entry_to_line(self, path, ie):
+        l = ie.revision.decode('utf-8') + ' '
+        l += {True: 'yes', False: 'no'}[ie.executable] + ' '
+        l += Testament._entry_to_line(self, path, ie)
+        return l
+
+    def as_text_lines(self):
+        lines = ['bazaar-ng testament version 2']
+        lines.extend(Testament.as_text_lines(self)[1:])
+        return lines


As I've said earlier, I agree we need to get this into core, and then
clean it up, rather than it continually languishing.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060525/61904f5b/attachment.pgp