[MERGE][bzr-email] Implement mail sending using smtplib

Fri Jan 26 15:27:25 GMT 2007

Marius Gedminas wrote:
> On Thu, Jan 25, 2007 at 05:38:15PM -0600, John Arbash Meinel wrote:
>> Then I implemented an SMTPConnection class, which has a little bit of
>> knowledge about what configuration parameters bzrlib should use to
>> define an SMTP connection (smtp_server, smtp_username, smtp_password).
>>
>> That part was easy. The tricky part was to make everything support
>> Unicode. And support it in a way that wouldn't confuse my SMTP server,
>> and would look nice in Thunderbird.
> 
> Ah, yes.  Been there, done that
> (http://mg.pov.lt/blog/unicode-emails-in-python.html).

...

>> +        # We want a cStringIO because we want an 8-bit string
> 
> StringIO.StringIO can handle both 8-bit strings and Unicode strings.
> (It cannot handle mixed content, though).  Perhaps rephrase the comment
> to
> 
>            # We can use cStringIO because we have an 8-bit string
> 
> ?
> 
> Or is there some type-based dispatching in bzrlib that chooses Unicode
> versus 8-bit from the file object?

No, just that 'diff' should be writing 8-bit only, because it is writing
out the contents of files. So "We can" is more accurate. Also, I want it
to actually fail if it tries to directly write a Unicode string.

> 
>> +        from cStringIO import StringIO
>> +        diff_content = StringIO()
>> +        show_diff_trees(tree_old, tree_new, diff_content)
>> +        lines = diff_content.getvalue().split("\n")
>> +        numlines = len(lines)
> 
> It would be more efficient to do
> 
>            numlines = diff_content.getvalue().count("\n") + 1

Sure, quite a few of your comments are actually because I was moving the
old code around, so it looks new. But I'm happy enough to clean it up.

> 
> since you don't use ``lines`` for anything else.
> 
>> +        if numlines <= difflimit:
>> +            return diff_content.getvalue()
>> +        else:
>> +            return ("\nDiff too large for email"
>> +                    " (%d lines, the limit is %d).\n"
>> +                    % (numlines, difflimit))
>> +
>> +    def difflimit(self):
>> +        """maximum number of lines of diff to show."""
> 
> Capitalize "Maximum"?
> 

done

> ...
>> +    def _send_using_process(self):
>> +        """Spawn a 'mail' subprocess to send the email."""
>> +        # TODO think up a good test for this, but I think it needs
>> +        # a custom binary shipped with. RBC 20051021
> 
> Why a custom binary?  You could set post_commit_mailer to the name of a
> test-helper Python script that stores its arguments and stdin somewhere
> on the disk.  Perhaps copy the test script into a temporary directory,
> and make it write its result to os.path.join(os.path.dirname(sys.argv[0]),
> 'out.txt').

Old comment. I think the problem is that we are calling Popen() without
'shell=True', so you also have to override _commandline() so that you
get [sys.executable, 'xxyyzz.py', ...]

However, this will also fail on win32 if you are using the standalone
'bzr'. Because there you don't have a regular python interpreter. I'm
not sure if we are trying to have tests pass under those circumstances,
or if we have a simple way of detecting it and skipping the test
(perhaps sys.frozen?)

Personally, I'm not very interested in adding lots of tests for this,
because I would rather see us transition to using smtplib. But I want to
hear from Robert, to see if he has a specific reason to want to use
'mail'. Maybe he has to go through some work to get emails sent, and he
doesn't want to have to reconfigure that.

> 
>> +        try:
>> +            process = subprocess.Popen(self._command_line(),
>> +                                       stdin=subprocess.PIPE)
>> +            try:
>> +                message = self.body().encode('utf8') + self.get_diff()
>> +                result = process.communicate(message)[0]
> ...

^- As an aside, this should actually fix:
  https://launchpad.net/bzr-email/+bug/76962

...

>> +        # Header() does a good job of doing the proper encoding. However it
>> +        # confuses my SMTP server because it doesn't decode the strings. So it
>> +        # is better to send the addresses as:
>> +        #   =?utf-8?q?username?= <email at addr.com>
>> +        # Which is how Thunderbird does it
> 
> I think encoding the <email at addr.com> part is actually a violation of RFC-2822.
> The stdlib email package is quirky.
> 

I think you have the wrong RFC, I think it is actually:
http://www.faqs.org/rfcs/rfc2047.html

Which describes using "=?utf-8?" notation.

http://www.faqs.org/rfcs/rfc2231.html

seems to extend it.

It took a while to track down, but I did find this part:
   + An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.

   + An 'encoded-word' MUST NOT appear within a 'quoted-string'.

   + An 'encoded-word' MUST NOT be used in a Received header field.

   + An 'encoded-word' MUST NOT be used in parameter of a MIME
     Content-Type or Content-Disposition field, or in any structured
     field body except within a 'comment' or 'phrase'.

So it does seem like you can't have <email at addr.com> encoded.

>> +        from_user, from_email = self._split_address(from_address)
>> +        msg['From'] = '%s <%s>' % (Header(from_user, 'utf8'), from_email)
> 
> If you use Header(some_str, 'utf8'), then it will *always* use the
> =?utf-8?..?= quoting (at least it acts this way in Python 2.4).  This is
> redundant for names that can be expressed in plain text.
> Header(some_unicode) does the quoting only when necessary.  The type of
> the string is important, the email package treats 'foo' and u'foo'
> differently.

Yeah, I'll change that, and verify that the strings are unicode first.

> 
>> +        msg['User-Agent'] = 'bzr/%s' % _bzrlib_version
>> +
>> +        to_emails = []
>> +        to_header = []
>> +        for addr in to_addresses:
>> +            to_user, to_email = self._split_address(addr)
>> +            to_emails.append(to_email)
>> +            to_header.append('%s <%s>' % (Header(to_user, 'utf8'), to_email))
>> +
>> +        msg['To'] = ', '.join(to_header)
>> +        msg['Subject'] = Header(subject)
>> +        return msg, from_email, to_emails
> ...
> 
>> === added file tests/test_smtp_connection.py // file-id:test_smtp_connection-20
>> ... 070125220755-k6ueimjqwn16wvr9-2 // last-changed:john at arbash-meinel.com-2007
>> ... 0125225439-5btrv82m7dhekt8z
>> --- /dev/null
>> +++ tests/test_smtp_connection.py
> ...
>> +class InstrumentedSMTPConnection(SMTPConnection):
>> +    """Instrument SMTPConnection.
>> +
>> +    We don't want to actually connect or send messages, so this just
>> +    fakes it.
>> +    """
>> +    
> 
> What does bzr's coding style say about trailing whitespace?
> 

I'm not sure what you are calling 'trailing whitespace'. Are you saying
"adding a fully blank line at the end of docstrings"? Which we don't do.
Or "adding whitespace to the end of a single line"? which we ask not to
do, but we are avoiding going through the rest of the codebase to clean
it up, as it can cause conflicts in what would otherwise be a trivial merge.

Or are you saying:

"""Comment

foo."""

Which we also don't do.

> ...
>> +class TestSMTPConnection(TestCase):
>> +
>> +    def get_connection(self, text):
>> +        my_config = config.GlobalConfig()
>> +        config_file = StringIO(text)
>> +        (my_config._get_parser(config_file))
> 
> Why the parentheses around it?

Hysterical raisins. I copied the code from elsewhere, updated it a bit,
but not quite enough.

> 
>> +        return InstrumentedSMTPConnection(my_config)
> ...
> 
>> === modified file README
>> --- README
>> +++ README
> ...
>> +Also, if using 'smtplib', the messages will be sent as a utf-8 text message,
> 
> Some pedantic people insist that UTF-8 be spelled "UTF-8", with all the
> capital letters, as it is an abbreviation for UCS (or Unicode)
> Transformation Format.

I'm pretty sure we have never been pedantic about UTF-8, especially
since capital letters imply stress that isn't meant to be there. I can
change it in public documentation, though.

> 
>> +with a 8-bit text diff attached (rather than all-as-one). Work has also been
>> +done to make sure usernames do not have to be ascii.
>> +
> ...
> 
> Marius Gedminas

I updated according to your critiques. Does anyone else have any comments?

As near as I can tell, bzr-email is just a project controlled by Robert,
rather than being a 'bzr' group project. So he is the only one that can
update any of the official branches.

I would guess that he feels it is meant to be a group project, but just
needs to change the project manager.

John
=:->