[MERGE] DirState pyrex helpers
John Arbash Meinel
john at arbash-meinel.com
Fri Jul 13 18:54:39 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jan 'RedBully' Seiffert wrote:
> John Arbash Meinel wrote:
> [snip snip]
>
>> +
>> +
>> +cdef extern from *:
>> + ctypedef int size_t
>> +
>> +
>> +
>
> 1) size_t is unsigned
> 2) size_t has a platform specific size, you just broke most 64bit archs
>
> is it possible to get it from <stddef.h> or <sys/types.h> or something
> like that?
>
> Or maybe i don't understand pyrex.
This is mostly just a hint to pyrex for how it should translate python objects
into C objects. But yes, I should switch it to unsigned.
These definitions are only for the translater, the actual C code uses the C
definitions for all structs/typedefs/etc.
>
> [snip memrchr by repeatly memchr]
> is this wise?
> Can #ifdef be used with pyrex?
I haven't see #ifdef as a possibility.
> is a simle loop slower?
Well, a simple loop is slower than memchr on most platforms, but certainly it
could be implemented as such.
> don't forget, you have the length, so you do not need to search from the
> start to the end, but could start your search there.
>
> [snip]
>
>> + # Use 32-bit comparisons for the matching portion of the string.
>> + # Almost all CPU's are faster at loading and comparing 32-bit integers,
>> + # than they are at 8-bit integers.
>> + # TODO: jam 2007-05-07 Do we need to change this so we always start at an
>> + # integer offset in memory? I seem to remember that being done in
>> + # some C libraries for strcmp()
>
> ouch...
>
> On x86 like architectures you only get a performance hit by unaligned
> data access.
> On other architectures you get either:
> 1) A SIGBUS (program termination)
> 2) A deadly performance hit, because you will jump to a kernel processor
> exceptions handler, which emulates the load (but first needs to decode
> why you ended up there).
>
> Maybe the following little program will be of use to you.
> The cmp_by_dirs routine is derived from a "do str*n*len fast" (strnlen
> is also a GNU-extension) example (and a little hackery i did once for a
> memxor func, its a little harder to handle two possible unaligned pointer).
> It could be reordered in many ways, this is just one example (but i
> think "goto" is seen as evil...).
>
>> #include <stdint.h>
>>
>> extern int printf(const char *, ...);
>> extern int strlen(const char *);
>>
>> #define SOUI (sizeof(unsigned))
>> #define IS_ALIGNED(p, n) (!(((intptr_t)(p)) & ((n) - 1L)))
>> #define ALIGN(p, n) ((intptr_t)((p)+(n) - 1L) & ~((intptr_t)(n) - 1L))
>>
>>
>> int cmp_by_dirs(char *path1, int size1, char *path2, int size2)
>> {
>> int minlen = size1 < size2 ? size1 : size2;
>> const char *p1 = (const char *)path1;
>> const char *p2 = (const char *)path2;
>>
>> if(p1 == p2)
>> return 0;
>>
>> minlen++;
>> p1--;
>> p2--;
>>
>> /* is alignment possible (both have to be aligned!)? */
>> if(p1 - p2 == (char *)ALIGN(p1, SOUI) - (char *)ALIGN(p2, SOUI))
>> {
>> do
>> {
>> p1++;
>> p2++;
>> minlen--;
>>
>> /* are both aligned and enough bytes left? */
>> if(IS_ALIGNED(p1, SOUI) && IS_ALIGNED(p2, SOUI) &&
>> (SOUI-1L) < (minlen-SOUI))
>> {
>> /* do it with a bigger type */
>> register const unsigned *p1_u = ((const unsigned *)p1)-1;
>> register const unsigned *p2_u = ((const unsigned *)p2)-1;
>> minlen += SOUI;
>> do
>> {
>> p1_u++;
>> p2_u++;
>> minlen -= SOUI;
>> } while(*p1_u == *p2_u && (SOUI-1L) < minlen);
>> p1 = (const char *) p1_u;
>> p2 = (const char *) p2_u;
>> }
>> } while(*p1 == *p2 && minlen);
>> }
>> else
>> {
>> /* nope, do it bytewise */
>> do
>> {
>> p1++;
>> p2++;
>> minlen--;
>> } while(minlen && *p1 == *p2);
>> }
>>
>> /* we fall out of the loop, because the char differed? */
>> if(*p1 != *p2) {
>> if('/' == *p1)
>> return -1; /* end of path1 segment first */
>> if('/' == *p2)
>> return 1; /* end of path2 segment first */
>> if(*p1 < *p2)
>> return -1;
>> else
>> return 1;
>> }
>>
>> /* chars are equal, are we at the end of both paths? */
>> if(p1 < (path1 + size1))
>> return 1;
>> if(p2 < (path2 + size2))
>> return -1;
>> /* seems so */
>> return 0;
>> }
>>
>> int main(int argc, char *argv[])
>> {
>> if(argc < 3)
>> return printf("More arguments\n");
>>
>> printf("%d\n", cmp_by_dirs(argv[1], strlen(argv[1]), argv[2], strlen(argv[2])));
>>
>> return 0;
>> }
>
> Greetings
> Jan
>
> who doesn't get along with python, but at least knows a little C ;)
>
Thanks, I'll look into incorporating that. On the flip side, we are probably
guaranteed that everything is aligned, just because we are using Python strings
that were allocated on the heap (and we are always comparing the whole thing).
But either way, it seems worthwhile to do it 'correctly'.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGl7xeJdeBCYSNAAMRAqinAJ94ksY183v+0OCWtxUuSRrG42n61ACgj3Wy
a/EkuTgFNySIwyanlXAu7CU=
=x0O6
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list