[PATCH][NIH] nih_parse parsing framework

Scott James Remnant scott at netsplit.com
Wed Feb 10 17:02:43 GMT 2010


On Wed, 2010-02-10 at 01:18 -0500, Casey Dahlin wrote:

> Well, after an exahusting series of rewrites and redesigns, the nih_parse
> framework is finally here!
> 
> Basically this consists of a generic PEG
> (http://en.wikipedia.org/wiki/Parsing_expression_grammar) parser that accepts a
> grammar specification as input. As of right now that specification is given by
> defining a few static struct constants (see the testcase for examples). I'll be
> writing a utility to generate parsers from a DSL shortly.
> 
Could you give some example as to its use?

I'm assuming that you're aiming this to replace the existing parser in
nih_config?

What benefit(s) does it have over the current recursive-descent code?

What is missing?  At first glance, the error reporting seems rather less
- reducing all errors to just a "parse error"?

> One aside: trying to get this on launchpad I get:
> 
> bzr: ERROR:
> CHKInventoryRepository('lp-64613584:///~scott/libnih/trunk/.bzr/repository')
> is not compatible with
> KnitPackRepository('lp-64613584:///~cjdahlin/libnih/nih-parse/.bzr/repository')
> different serializers
> 
> What's the fix for that?
> 
What version of bzr are you using?  What version is your branch?


Your patch doesn't apply to current trunk:

patching file Makefile.am
Hunk #1 FAILED at 25.
Hunk #2 FAILED at 58.
Hunk #3 FAILED at 95.
3 out of 3 hunks FAILED -- saving rejects to file Makefile.am.rej
patching file errors.h
Hunk #1 FAILED at 42.
Hunk #2 FAILED at 62.
2 out of 2 hunks FAILED -- saving rejects to file errors.h.rej
patching file parse.c
patching file parse.h
patching file tests/test_parse.c


Please clean this up ;-)  Otherwise I can't give a full review.


Have you ensured that all of the added code is covered by test cases?


> +test_parse_SOURCES = tests/test_parse.c
> +test_parse_LDFLAGS = -static
> +test_parse_LDADD = libnih.la
> +
This is out of order, the rest of the tests are in the same order as
declared in SOURCES above.

> + * XOR:
> + * @a: first condition,
> + * @b: second condition.
> + *
> + * True if EXACTLY ONE of @a and @b is true.
> + **/
> +#define XOR(a,b) ((!(a) && (b)) || (!(b) && (a)))
> +
> 
There are much better ways to express XOR in C; and there are ways to
write the macro which don't rely on multiple expansion of the
parameters.  This smells like the kind of thing that should go in
nih/macros.h as nih_xor()

> +{
> +#define RECURSIVE_LOCALS			\
> +	size_t                  iter;		\
> +	NihParseSymbolChildRef *child;		\
> +	NihParseSymbolChildRef *parent_ref;	\
> +	NihParseNode           *ret;		\
> +	const char             *old_pos;	\
> +	size_t                  old_len;	\
> +	size_t                  times;		\
> +	int                     alt_matched;	\
> +	size_t                  nth_time;       \
> +	int                     success;	\
> +	NihParseNode           *tail;		\
> +	NihParseNode           *parent
> +
> =+	RECURSIVE_LOCALS;
> 
UGH.

I really don't like this.

Btw coding style says the "*" belongs with the type, not the variable
name.

> +	struct frame {
> +		struct frame           *prev;
> +		void                   *retpoint;
> +
> +		NihParseSymbol         *symbol;
> +
> +		RECURSIVE_LOCALS;
> +	} *frame = NULL;
> +
This is a stack frame on the stack?!  What's the point?

> +#define DO_OOM_ABORT() ({				\
> +	while (frame) {					\
> +		struct frame *to_free = frame;		\
> +		frame = frame->prev;			\
> +		if (to_free->ret)			\
> +			nih_free (to_free->ret);	\
> +		nih_free (to_free);			\
> +	}						\
> +							\
> +	nih_error_raise_no_memory ();			\
> +	if (ret)					\
> +		nih_free (ret);				\
> +							\
> +	return NULL;					\
> +})
> +
Also UGH.

> +#define DO_CALL(parent_new, symbol_new, ret_new, nth_new, parent_ref_new) ({ \
> +	__label__ retpoint;					\
> +	struct frame *next = nih_new (NULL, struct frame);	\
> +	struct frame *my_prev;					\
> +								\
> +	if (! next)						\
> +		DO_OOM_ABORT ();				\
> +								\
> +	next->prev = frame;					\
> +	frame = next;						\
> +								\
> +	frame->iter = iter;					\
> +	frame->child = child;					\
> +	frame->ret = ret;					\
> +	frame->old_pos = old_pos;				\
> +	frame->old_len = old_len;				\
> +	frame->times = times;					\
> +	frame->parent = parent;					\
> +	frame->parent_ref = parent_ref;				\
> +	frame->symbol = symbol;					\
> +	frame->success = success;				\
> +	frame->tail = tail;					\
> +	frame->alt_matched = alt_matched;			\
> +	frame->retpoint = &&retpoint;				\
> +								\
> +	parent = (parent_new);					\
> +	parent_ref = (parent_ref_new);				\
> +	symbol = (symbol_new);					\
> +	ret = (ret_new);					\
> +	nth_time = (nth_new);					\
> +								\
> +	goto begin;						\
> +								\
> +retpoint:							\
> +	my_prev = frame->prev;					\
> +	nih_free (frame);					\
> +	frame = my_prev;					\
> +								\
> +	(got_returned);						\
> +})
> +
UGH.

> +#define DO_RETURN(x) ({							\
> +	NihParseNode *my_ret = (x);					\
> +									\
> +	if (! frame) {							\
> +		if (! my_ret)						\
> +			nih_error_raise_printf (NIH_PARSE_FAILED,	\
> +					 _(NIH_PARSE_FAILED_STR),	\
> +					 "symbol");			\
> +		else if (alloc_parent)					\
> +			nih_ref (my_ret, alloc_parent);			\
> +									\
> +		return my_ret;						\
> +	}								\
> +									\
> +	got_returned = my_ret;						\
> +	iter = frame->iter;						\
> +	child = frame->child;						\
> +	ret = frame->ret;						\
> +	old_pos = frame->old_pos;					\
> +	old_len = frame->old_len;					\
> +	times = frame->times;						\
> +	parent = frame->parent;						\
> +	parent_ref = frame->parent_ref;					\
> +	symbol = frame->symbol;						\
> +	tail = frame->tail;						\
> +	success = frame->success;					\
> +	alt_matched = frame->alt_matched;				\
> +	goto *frame->retpoint;						\
> +})
> +
UGH.


Sorry, I really don't like these code constructs as macros.  It makes
your code nearly impossible to read.

> +begin:
> 
Is this a loop?  Then use a loop, not goto!

> +		NihParseNode *term =
> +			nih_parse_terminal_symbol_matches (parent, symbol,
> +							   text_pos,
> +							   text_end - text_pos,
> +							   &oom_flag,
> +							   nth_time,
> +							   parent_ref);
> +
Mid-block declarations are forbidden.

> +		if (term)
> +			text_pos += symbol->length;
> +		else if (oom_flag)
> +			DO_OOM_ABORT ();
> +
Braces around if/else.  Always.

> +		if (child->flags & NIH_PARSE_NOT ||
> +		    times < child->min_times ||
> +		    child->flags & NIH_PARSE_LOOKAHEAD)
> +			text_pos = old_pos;
> +
> 
Parens.

> +backtrack:
> 
No.

I found this code really hard to follow; it needs a massive amount of
comments to be understood.  Frankly, it needs restructuring.

> +/**
> + * nih_parse_terminal_symbol_matches:
> + * @parent: Alloc parent,
> + * @symbol: symbol to check,
> + * @text_pos: position in text to check,
> + * @text_len: length of text after @text_pos,
> + * @oom_flag: int to set to 1 if an OOM error occurs,
> + * @nth_time: this is the nth time this symbol has been matched,
> + * @parent_ref: reference from parent node to the node to be created.
> + *
> + * Check to see if @symbol matches at @text_pos, where @symbol is a terminal
> + * symbol.
> + *
> + * Returns: A new parse node or NULL on no match or OOM.
> + **/
> 
NULL on "no match" *OR* OOM just isn't acceptable.

Scott
-- 
Have you ever, ever felt like this?
Had strange things happen?  Are you going round the twist?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/upstart-devel/attachments/20100210/217314d1/attachment.pgp 


More information about the upstart-devel mailing list