[MERGE] DirState pyrex helpers

Jan 'RedBully' Seiffert redbully at cc.fh-luh.de
Fri Jul 13 17:40:41 BST 2007


John Arbash Meinel wrote:
[snip snip]

> +
> +
> +cdef extern from *:
> +    ctypedef int size_t
> +
> +
> +

1) size_t is unsigned
2) size_t has a platform specific size, you just broke most 64bit archs

is it possible to get it from <stddef.h> or <sys/types.h> or something
like that?

Or maybe i don't understand pyrex.

[snip memrchr by repeatly memchr]
is this wise?
Can #ifdef be used with pyrex?
is a simle loop slower?
don't forget, you have the length, so you do not need to search from the
start to the end, but could start your search there.

[snip]

> +    # Use 32-bit comparisons for the matching portion of the string.
> +    # Almost all CPU's are faster at loading and comparing 32-bit integers,
> +    # than they are at 8-bit integers.
> +    # TODO: jam 2007-05-07 Do we need to change this so we always start at an
> +    #       integer offset in memory? I seem to remember that being done in
> +    #       some C libraries for strcmp()

ouch...

On x86 like architectures you only get a performance hit by unaligned
data access.
On other architectures you get either:
1) A SIGBUS (program termination)
2) A deadly performance hit, because you will jump to a kernel processor
exceptions handler, which emulates the load (but first needs to decode
why you ended up there).

Maybe the following little program will be of use to you.
The cmp_by_dirs routine is derived from a "do str*n*len fast" (strnlen
is also a GNU-extension) example (and a little hackery i did once for a
memxor func, its a little harder to handle two possible unaligned pointer).
It could be reordered in many ways, this is just one example (but i
think "goto" is seen as evil...).

> #include <stdint.h>
> 
> extern int printf(const char *, ...);
> extern int strlen(const char *);
> 
> #define SOUI	(sizeof(unsigned))
> #define IS_ALIGNED(p, n)  (!(((intptr_t)(p)) & ((n) - 1L)))
> #define ALIGN(p, n) ((intptr_t)((p)+(n) - 1L) & ~((intptr_t)(n) - 1L))
> 
> 
> int cmp_by_dirs(char *path1, int size1, char *path2, int size2)
> {
> 	int minlen = size1 < size2 ? size1 : size2;
> 	const char *p1 = (const char *)path1;
> 	const char *p2 = (const char *)path2;
> 
> 	if(p1 == p2)
> 		return 0;
> 
> 	minlen++;
> 	p1--;
> 	p2--;
> 	
> 	/* is alignment possible (both have to be aligned!)? */
> 	if(p1 - p2 == (char *)ALIGN(p1, SOUI) - (char *)ALIGN(p2, SOUI))
> 	{
> 		do
> 		{
> 			p1++;
> 			p2++;
> 			minlen--;
> 
> 			/* are both aligned and enough bytes left? */
> 			if(IS_ALIGNED(p1, SOUI) && IS_ALIGNED(p2, SOUI) &&
> 				(SOUI-1L) < (minlen-SOUI))
> 			{
> 				/* do it with a bigger type */
> 				register const unsigned *p1_u = ((const unsigned *)p1)-1;
> 				register const unsigned *p2_u = ((const unsigned *)p2)-1;
> 				minlen += SOUI;
> 				do
> 				{
> 					p1_u++;
> 					p2_u++;
> 					minlen -= SOUI;
> 				} while(*p1_u == *p2_u && (SOUI-1L) < minlen);
> 				p1 = (const char *) p1_u;
> 				p2 = (const char *) p2_u;
> 			}
> 		} while(*p1 == *p2 && minlen);
> 	}
> 	else
> 	{
> 		/* nope, do it bytewise */
> 		do
> 		{
> 			p1++;
> 			p2++;
> 			minlen--;
> 		} while(minlen && *p1 == *p2);
> 	}
> 
> 	/* we fall out of the loop, because the char differed? */
> 	if(*p1 != *p2)	{
> 		if('/' == *p1)
> 			return -1; /* end of path1 segment first */
> 		if('/' == *p2)
> 			return 1;  /* end of path2 segment first */
> 		if(*p1 < *p2)
> 			return -1;
> 		else
> 			return 1;
> 	}
> 
> 	/* chars are equal, are we at the end of both paths? */
> 	if(p1 < (path1 + size1))
> 		return 1;
> 	if(p2 < (path2 + size2))
> 		return -1;
> 	/* seems so */
> 	return 0;
> }
> 
> int main(int argc, char *argv[])
> {
> 	if(argc < 3)
> 		return printf("More arguments\n");
> 
> 	printf("%d\n", cmp_by_dirs(argv[1], strlen(argv[1]), argv[2], strlen(argv[2])));
> 
> 	return 0;
> }

Greetings
	Jan

who doesn't get along with python, but at least knows a little C ;)



More information about the bazaar mailing list