Identify automatic str/unicode coercions

John Arbash Meinel john at arbash-meinel.com
Wed Jun 11 14:08:06 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin von Gagern wrote:
| Hi!
|
| There is this interesting thread called "About encoding issues" on the
| bazaar mailing list, started by Jan Hudec:
| http://thread.gmane.org/gmane.comp.version-control.bazaar-ng.general/10908
|
| The idea is that automatic conversions between byte and unicode strings
| should be avoided, as they are bound to fail if a string contains
| non-ASCII characters. Instead, all conversions should be done ecplicitely.
|

...

| Right now it simply writes to a linear log, which quickly grows to sizes
| where it becomes difficult to manage. I tried to make the log writing
| module easily replaceable, and I would think of maybe some sqlite backed
| log with one table for backtraces (one line each with pointer to parent)
| and one with counters for actual occurrences. Of course there would
| still be some post processing overhead to turn this into something useful.

I think using "warnings.warn" is a reasonable place to do this. Come up with
your own warning class, and it can be easily filtered. And mostly, since we
shouldn't be doing automatic conversions, this log should dwindle with time.

John
=:->

|
| As I don't plan to become a dedicated bzr developer in the near future,
| don't even speak Python fluently, and have invested more time already
| into bzr and bzr-svn than I can honestly afford, I can't take this idea
| much further all by myself. If someone else were working on this as
| well, I might be able to cooperate from time to time. I hope I can find
| somebody interested in taking this up.
|
| My plan would be to somehow achieve useful logs, dropping irrelevant
| stuff like when the string comes from a fixed literal in bzr code,
| grouping by leaf function that actually performs the conversion, sorting
| by number of times that conversion occurred. Then those could be tackled
| one at a time, replacing implicit coercions to explicit
| encoding/decoding, preferably with the correct encoding applicable to
| the string at hand. Some way to measure progress would be helpful as well.
|
| Greetings,
|  Martin von Gagern
|

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhPzjYACgkQJdeBCYSNAAPs/wCeM/7UHsRPZEqs0iyC6Nz/PfZM
5RgAninIx3lbc8WsmDLaG5iexCbYIZJ5
=2XDI
-----END PGP SIGNATURE-----



More information about the bazaar mailing list