Rethinking intern() for python

Raindog raindog at
Wed Apr 8 03:37:38 BST 2009

Stephen J. Turnbull wrote:
> John Arbash Meinel writes:
>  > Going a bit further, I would bet that 90% of all strings are <32kB long.
>  > So you could change the ob_size member to be a 'short', and have the
>  > upper bit set indicate that there are extra bytes elsewhere to indicate
>  > the total length of the string. Or you could have everything but 0xFFFF
>  > be valid, and have that value indicate there is a different structure to
>  > check.
> Dunno if it applies here (our rationale is GC, which compacts small
> strings), but in XEmacs instead of having the logic in string, we have
> a Lisp_String interface which is implemented internally as small
> strings (malloc'd in pools) and big strings (malloc'ed individually).
The C++ STL std::string implementation in visual studio uses a static 
sized character array to optimize for small strings so that in the 
scenario where a string is less than say 12 bytes, no extra dynamic 
allocation is needed.

More information about the bazaar mailing list