Rev 837: add specification documenting unicode policy. in file:///data/jelmer/bzr-svn/0.4/

Jelmer Vernooij jelmer at samba.org
Fri Jan 18 04:05:54 GMT 2008


At file:///data/jelmer/bzr-svn/0.4/

------------------------------------------------------------
revno: 837
revision-id:jelmer at samba.org-20080117222815-sq2efbyz39ajmxck
parent: jelmer at samba.org-20080117193839-j9q7d5clyv3qc91s
committer: Jelmer Vernooij <jelmer at samba.org>
branch nick: 0.4
timestamp: Thu 2008-01-17 23:28:15 +0100
message:
  add specification documenting unicode policy.
added:
  specs/                         specs-20080117192345-rqaekhndwdeg4p24-1
  specs/unicode.txt              unicode.txt-20080117193237-e9it2di8ed47svqo-1
=== added directory 'specs'
=== added file 'specs/unicode.txt'
--- a/specs/unicode.txt	1970-01-01 00:00:00 +0000
+++ b/specs/unicode.txt	2008-01-17 22:28:15 +0000
@@ -0,0 +1,46 @@
+Unicode policy
+==============
+
+.. contents:: :local:
+
+Motivation
+----------
+In order to reduce the number of conversions between unicode and regular 
+strings, it is useful to be consistent in what encoding APIs accept their 
+arguments.
+
+Without a proper policy, it is possible to get the encoding wrong, or 
+end up with indexing bugs.
+
+Policy
+------
+
+Revision and file ids
+~~~~~~~~~~~~~~~~~~~~~
+Revision and file ids will be passed as regular strings, consistent with 
+Bazaar itself.
+
+Subversion paths
+~~~~~~~~~~~~~~~~
+Subversion itself returns regular strings using the utf-8 encoding. For that 
+reason, all branch paths also use that encoding and are not converted to 
+unicode objects until they are added to the inventory.
+
+Subversion properties
+~~~~~~~~~~~~~~~~~~~~~
+Subversion itself returns regular strings using the utf-8 encoding. Since the 
+bzr:file-ids property contains metadata to be used to construct 
+Bazaar inventories, the dictionary with file ids returned by various APIs 
+should use unicode objects for the keys and regular string objects for the 
+values (paths as unicode, file ids as regular strings).
+
+Branching schemes
+~~~~~~~~~~~~~~~~~
+Since serialized branching schemes are part of revision ids, which are 
+regular strings, they are also regular strings.
+
+SQLite cache
+~~~~~~~~~~~~~
+Since SQLite tends to return unicode strings, most strings need to be 
+encoded as utf8 before they are returned for use by other parts of the
+code. The LogWalker object should take care of this.




More information about the bazaar-commits mailing list