[rfc] Help needed: encodings problems again

Anthon van der Neut anthon at mnt.org
Tue Jun 13 08:02:54 BST 2006


I have not read through all of the 12000 messages in this mailing list 
(yet) so excuses if this has come up before.

Is there any reason not to redirect sys.stdout to some other object that 
does the `right' thing depending on whether the output is going to file 
or not. That way print can be used without any changes

I prefer the terseness of
   print 'exit:', exit_value
over
   print >> self.outf, 'exit:', exit_value

Importing the attached file would do the basics, anything else you 
should be able to handle within that.
One thing that is initially confusing is how often the write method is 
called (ie. multiple times for one print 'hello world'), please look at 
the output from 'python redirectstdout.py' which has the argument to 
each call in []

Anthon


John Arbash Meinel wrote:
> Alexander Belchenko wrote:
>> Despite that fact that encodings work is landed into bzr.dev there is
>> too much places in bzrlib where sys.stdout used directly and therefore
>> it create problems with unicode strings when output redirected via pipe
>> and sys.stdout hide real encoding of console.
>>
>> I faced with problems in execution of command
>>
>> bzr missing | less
>>
>> when in revisions messages appears russian characters.
>>
>> When I grep through bzrlib for 'sys.stdout' I have list of about 70
>> lines where sys.stdout used. And most of this lines is potentially
>> caused problems when output of bzr will be redirected. And this is very
>> sad.
>>
>> Fixing improper usage of sys.stdout in builtins command is very trivial,
>> because we have attribute self.outf in all descendant of Command class.
>> But I don't understand how to deal with other places that itself
>> unrelated to Command class.
>>
>> Help needed.
>>
>> Attached patch solve problems with builtins command. Attached grep
>> output showed lines where sys.stdout appears.
>>
>> -- 
>> Alexander
>>
> 
> Well, we actually have to watch out for both 'sys.stdout' and 'print'.
> Since both of the print directly to the output, and not to a nicely
> wrapped file.
> 
> The specific methods for how to handle it is that things that write to
> the screen should be modified so that they are now:
> 
> def foo(..., to_file=None):
>   if to_file is None:
>     # DEPRECATED WARNING
>     to_file = sys.stdout
> 
> That lets the builtins upgrade them properly, without breaking API
> compatibility at this point.
> 
> Also, the real goal is to write new tests in
> bzrlib/tests/blackbox/test_non_ascii.py
> 
> This does lots of work to force the commands to run in encodings that
> they are not happy about. And tests that the appropriate return code is
> raised.
> 
> I got to maybe 50% of the commands, so there are certainly more that
> should be fixed. We wanted to get it merged so that other people could
> work on it as well.
> 
> The other reason to write them as test cases is because 'bzr missing'
> should use "encoding_type='replace'" which says that it is okay if it
> cannot represent the expected characters, while 'bzr ls' should fail if
> it cannot properly represent the files. It just has to do with whether
> the output is designed more for user edification, versus something you
> would pipe into xargs, or other things. (cat-revision would be 'strict',
> bzr diff and bzr bundle are actually 'exact' which says that there
> should be no transparent encoding going on, because we will be putting
> out raw bytestreams)
> 
> So I'm +0.5 on this change, since it fixes some more cases. But I'd be
> fully +1 if you could write a few more tests in test_non_ascii.py
> 
> John
> =:->
> 
> PS> Don't interpret to_file too early. If the specific function doesn't
> actually print anything, but just passes it down to another function
> which does, then you probably shouldn't replace it with sys.stdout.
> 
>> ------------------------------------------------------------------------
>>
>> === modified file 'bzrlib/builtins.py'
>> --- bzrlib/builtins.py	2006-06-08 06:38:32 +0000
>> +++ bzrlib/builtins.py	2006-06-12 16:32:42 +0000
>> @@ -1133,7 +1133,7 @@
>>                  raise BzrCommandError('bzr diff --revision takes exactly one or two revision identifiers')
>>          else:
>>              if tree2 is not None:
>> -                return show_diff_trees(tree1, tree2, sys.stdout, 
>> +                return show_diff_trees(tree1, tree2, self.outf,
>>                                         specific_files=file_list,
>>                                         external_diff_options=diff_options,
>>                                         old_label=old_label, new_label=new_label)
>> @@ -2354,7 +2354,7 @@
>>                  if (log_format == None):
>>                      default = bzrlib.config.BranchConfig(local_branch).log_format()
>>                      log_format = get_log_format(long=long, short=short, line=line, default=default)
>> -                lf = log_formatter(log_format, sys.stdout,
>> +                lf = log_formatter(log_format, self.outf,
>>                                     show_ids=show_ids,
>>                                     show_timezone='original')
>>                  if reverse is False:
>> @@ -2431,9 +2431,9 @@
>>                  rev_id = revision[0].in_history(b).rev_id
>>              t = Testament.from_revision(b.repository, rev_id)
>>              if long:
>> -                sys.stdout.writelines(t.as_text_lines())
>> +                self.outf.writelines(t.as_text_lines())
>>              else:
>> -                sys.stdout.write(t.as_short_text())
>> +                self.outf.write(t.as_short_text())
>>          finally:
>>              b.unlock()
>>  
>> @@ -2473,7 +2473,7 @@
>>              file_id = tree.inventory.path2id(relpath)
>>              tree = branch.repository.revision_tree(revision_id)
>>              file_version = tree.inventory[file_id].revision
>> -            annotate_file(branch, file_version, file_id, long, all, sys.stdout)
>> +            annotate_file(branch, file_version, file_id, long, all, self.outf)
>>          finally:
>>              branch.unlock()
>>  
>> @@ -2604,7 +2604,7 @@
>>  
>>          for r in range(revno, b.revno()+1):
>>              rev_id = b.get_rev_id(r)
>> -            lf = log_formatter('short', to_file=sys.stdout,show_timezone='original')
>> +            lf = log_formatter('short', to_file=self.outf, show_timezone='original')
>>              lf.show(r, b.repository.get_revision(rev_id), None)
>>  
>>          if dry_run:
>>
> 
> I can go through the list that you are mentioning here, but I'm guessing
> that most of these are actually valid.
> If you added some context, it would be more obvious. Try 'grep -C2'.
> 
> All comments follow the lines I'm talking about.
> 
> 
>> ------------------------------------------------------------------------
>>
>> bzrlib\add.py:61                  self._to_file = sys.stdout
> 
> This is a compatibility statement.
> 
>> bzrlib\annotate.py:40              to_file = sys.stdout
> 
> as is this
> 
>> bzrlib\builtins.py:328             #       sys.stdout encoding cannot represent it?
> 
> this is in a comment, which should probably be removed anyway.
> 
>> bzrlib\builtins.py  1136                    return show_diff_trees(tree1, tree2, sys.stdout, 
>> bzrlib\builtins.py  2357                    lf = log_formatter(log_format, sys.stdout,
>> bzrlib\builtins.py  2434                    sys.stdout.writelines(t.as_text_lines())
>> bzrlib\builtins.py  2436                    sys.stdout.write(t.as_short_text())
>> bzrlib\builtins.py  2476                annotate_file(branch, file_version, file_id, long, all, sys.stdout)
>> bzrlib\builtins.py  2607                lf = log_formatter('short', to_file=sys.stdout,show_timezone='original')
> 
> I'm pretty sure you fixed these in your patch.
> 
>> bzrlib\commands.py:201                 exact - do not encode sys.stdout
>> bzrlib\commands.py  233             # *way* too much like sys.stdout
>> bzrlib\commands.py  235                 self.outf = sys.stdout
>> bzrlib\commands.py  238             output_encoding = getattr(sys.stdout, 'encoding', None)
>> bzrlib\commands.py  248                 mutter('encoding stdout as sys.stdout encoding %r', output_encoding)
>> bzrlib\commands.py  252             self.outf = codecs.getwriter(output_encoding)(sys.stdout, errors=self.encoding_type)
>> bzrlib\commands.py  669                 sys.stdout.flush()
>> bzrlib\commands.py  701                 sys.stdout.flush()
> 
> ALl the above seem to be valid, and need to use sys.stdout, or are in
> comments/docstrings.
> 
> 
>> bzrlib\diff.py:86          if to_file != sys.stdout:
> 
> This is something someone else worked on, but is valid for now.
> 
>> bzrlib\diff.py  178             output = sys.stdout
> 
> a compatibility workaround.
> 
> 
>> bzrlib\diff.py  225         output = sys.stdout
> 
> This one seems genuinely bogus
> 
>> bzrlib\diff.py  239         return show_diff_trees(old_tree, new_tree, sys.stdout, specific_files,
> 
> this one is bogus too.
> 
>> bzrlib\help.py:57              outfile = sys.stdout
>> bzrlib\help.py  94              outfile = sys.stdout
>> bzrlib\help.py  122             outfile = sys.stdout
>> bzrlib\help.py  146             outfile = sys.stdout
> 
> these all seem to be compatibility checks.
> 
>> bzrlib\log.py:510             to_file = codecs.getwriter(bzrlib.user_encoding)(sys.stdout, errors='replace')
> 
> this is probably bogus, since the user_encoding should be happening much
> higher up. (And it should be using sys.stdout.encoding, not
> user_encoding, etc).
> 
>> bzrlib\lsprof.py:42                  file = sys.stdout
> 
> This is probably okay. Since we have to write out the results somewhere,
> and this occurs before we are running any command.
> 
>> bzrlib\merge3.py:426         # sys.stdout.writelines(m3.merge_lines(name_a=argv[1], name_b=argv[3]))
>> bzrlib\merge3.py  427         sys.stdout.writelines(m3.merge_annotated())
> 
> bogus. merge3 should take a to_file.
> 
>> bzrlib\patiencediff.py:370             sys.stdout.write(line)
> 
> This is fine, because it is in a custom 'main()' function.
> 
>> bzrlib\progress.py:87                  to_messages_file = sys.stdout
>> bzrlib\progress.py  149                 to_messages_file = sys.stdout
> 
> compatibility lines
> 
>> bzrlib\shellcomplete.py:6               outfile = sys.stdout
>> bzrlib\shellcomplete.py  16              outfile = sys.stdout
>> bzrlib\shellcomplete.py  36              outfile = sys.stdout
>> bzrlib\shellcomplete.py  54              outfile = sys.stdout
> 
> same here.
> 
>> bzrlib\status.py:113             to_file = sys.stdout
>> bzrlib\textui.py:35              to_file = sys.stdout
> 
> and both of these
> 
> 
>> bzrlib\tree.py:106             sys.stdout.write(self.get_file_text(file_id))
> 
> This is bogus.
> 
>> bzrlib\weave.py:1360            sys.stdout.writelines(w.get_iter(int(argv[3])))
>> bzrlib\weave.py  1371            sys.stdout.writelines(diff_gen)
>> bzrlib\weave.py  1417            sys.stdout.writelines(w.weave_merge(p))
> 
> These are also a custom 'main()' function.
> 
>> bzrlib\weave_commands.py:58              write_weave(w1, sys.stdout)
>> bzrlib\weave_commands.py  90              sys.stdout.writelines(w.weave_merge(p))
> 
> weave_commands is an extension of builtins, and should be using 'self.outf'.
> 
> 
>> bzrlib\bundle\commands.py:141                 fileobj = sys.stdout
>> bzrlib\bundle\commands.py  166             #serializer_v4.write(cset_tree.inventory, sys.stdout)
> 
> Also an extension of builtins.
> 
> 
>> bzrlib\tests\__init__.py:378         Internally we can check sys.stdout to see what the output encoding
>> bzrlib\tests\__init__.py  658             :param encoding: encoding for sys.stdout and sys.stderr
>> bzrlib\tests\__init__.py  788             real_stdout = sys.stdout
>> bzrlib\tests\__init__.py  791                 sys.stdout = stdout
>> bzrlib\tests\__init__.py  796                 sys.stdout = real_stdout
>> bzrlib\tests\__init__.py  1145        runner = TextTestRunner(stream=sys.stdout,
> 
> probably these are okay, since it is running the test suite.
> 
>> bzrlib\tests\test_gpg.py:56                  # It is too much work to make sys.stdout be in binary mode.
>> bzrlib\tests\test_gpg.py  59                          'import sys; sys.stdout.write(sys.stdin.read())']
>> bzrlib\tests\test_patches.py:153                             sys.stdout.write(" orig:%spatch:%s" % (next,
>> bzrlib\tests\test_ui.py:52              # TODO: Test the output from TextUIFactory, perhaps by overriding sys.stdout
>> bzrlib\tests\blackbox\__init__.py:116                 self.stdout = sys.stdout
>> bzrlib\tests\blackbox\test_log.py:250             # encoded to sys.stdout.encoding
>> bzrlib\tests\blackbox\test_selftest.py:139             self.assertNotEqual(sys.stdout, self.factory.stdout)
>> bzrlib\tests\blackbox\test_status.py:226             self.stdout = sys.stdout
>> bzrlib\tests\blackbox\test_status.py  230             sys.stdout = self.stdout
>> bzrlib\tests\blackbox\test_status.py  246             sys.stdout = StringIO()
>> bzrlib\tests\blackbox\test_status.py  257             sys.stdout = StringIO()
>> bzrlib\ui\text.py:46                  self.stdout = sys.stdout
>> bzrlib\ui\text.py  75              prompt = (prompt % kwargs).encode(sys.stdout.encoding, 'replace')
>> bzrlib\util\elementtree\ElementTree.py:729     # Writes an element tree or element structure to sys.stdout.  This
>> bzrlib\util\elementtree\ElementTree.py  741         elem.write(sys.stdout)
>> bzrlib\util\elementtree\ElementTree.py  744             sys.stdout.write("\n")
>> bzrlib\util\urlgrabber\keepalive.py:498         sys.stdout.write('  first using the normal urllib handlers')
>> bzrlib\util\urlgrabber\keepalive.py  505         sys.stdout.write('  now using the keepalive handler       ')
>> bzrlib\util\urlgrabber\keepalive.py  545             sys.stdout.write('\r  %2i' % i)
>> bzrlib\util\urlgrabber\keepalive.py  546             sys.stdout.flush()
> 
> 
> I wouldn't change any of the util code. And probably the tests are fine.
> So probably 75% of these are valid compatibility uses, and hopefully
> I've made it clear which ones need to be fixed.
> 
> John
> =:->
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: redirectstdout.py
Type: text/x-python
Size: 1067 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060613/794207f0/attachment.py 


More information about the bazaar mailing list