[PATCH][NIH] nih_parse parsing framework
Scott James Remnant
scott at netsplit.com
Wed Feb 10 17:02:43 GMT 2010
On Wed, 2010-02-10 at 01:18 -0500, Casey Dahlin wrote:
> Well, after an exahusting series of rewrites and redesigns, the nih_parse
> framework is finally here!
>
> Basically this consists of a generic PEG
> (http://en.wikipedia.org/wiki/Parsing_expression_grammar) parser that accepts a
> grammar specification as input. As of right now that specification is given by
> defining a few static struct constants (see the testcase for examples). I'll be
> writing a utility to generate parsers from a DSL shortly.
>
Could you give some example as to its use?
I'm assuming that you're aiming this to replace the existing parser in
nih_config?
What benefit(s) does it have over the current recursive-descent code?
What is missing? At first glance, the error reporting seems rather less
- reducing all errors to just a "parse error"?
> One aside: trying to get this on launchpad I get:
>
> bzr: ERROR:
> CHKInventoryRepository('lp-64613584:///~scott/libnih/trunk/.bzr/repository')
> is not compatible with
> KnitPackRepository('lp-64613584:///~cjdahlin/libnih/nih-parse/.bzr/repository')
> different serializers
>
> What's the fix for that?
>
What version of bzr are you using? What version is your branch?
Your patch doesn't apply to current trunk:
patching file Makefile.am
Hunk #1 FAILED at 25.
Hunk #2 FAILED at 58.
Hunk #3 FAILED at 95.
3 out of 3 hunks FAILED -- saving rejects to file Makefile.am.rej
patching file errors.h
Hunk #1 FAILED at 42.
Hunk #2 FAILED at 62.
2 out of 2 hunks FAILED -- saving rejects to file errors.h.rej
patching file parse.c
patching file parse.h
patching file tests/test_parse.c
Please clean this up ;-) Otherwise I can't give a full review.
Have you ensured that all of the added code is covered by test cases?
> +test_parse_SOURCES = tests/test_parse.c
> +test_parse_LDFLAGS = -static
> +test_parse_LDADD = libnih.la
> +
This is out of order, the rest of the tests are in the same order as
declared in SOURCES above.
> + * XOR:
> + * @a: first condition,
> + * @b: second condition.
> + *
> + * True if EXACTLY ONE of @a and @b is true.
> + **/
> +#define XOR(a,b) ((!(a) && (b)) || (!(b) && (a)))
> +
>
There are much better ways to express XOR in C; and there are ways to
write the macro which don't rely on multiple expansion of the
parameters. This smells like the kind of thing that should go in
nih/macros.h as nih_xor()
> +{
> +#define RECURSIVE_LOCALS \
> + size_t iter; \
> + NihParseSymbolChildRef *child; \
> + NihParseSymbolChildRef *parent_ref; \
> + NihParseNode *ret; \
> + const char *old_pos; \
> + size_t old_len; \
> + size_t times; \
> + int alt_matched; \
> + size_t nth_time; \
> + int success; \
> + NihParseNode *tail; \
> + NihParseNode *parent
> +
> =+ RECURSIVE_LOCALS;
>
UGH.
I really don't like this.
Btw coding style says the "*" belongs with the type, not the variable
name.
> + struct frame {
> + struct frame *prev;
> + void *retpoint;
> +
> + NihParseSymbol *symbol;
> +
> + RECURSIVE_LOCALS;
> + } *frame = NULL;
> +
This is a stack frame on the stack?! What's the point?
> +#define DO_OOM_ABORT() ({ \
> + while (frame) { \
> + struct frame *to_free = frame; \
> + frame = frame->prev; \
> + if (to_free->ret) \
> + nih_free (to_free->ret); \
> + nih_free (to_free); \
> + } \
> + \
> + nih_error_raise_no_memory (); \
> + if (ret) \
> + nih_free (ret); \
> + \
> + return NULL; \
> +})
> +
Also UGH.
> +#define DO_CALL(parent_new, symbol_new, ret_new, nth_new, parent_ref_new) ({ \
> + __label__ retpoint; \
> + struct frame *next = nih_new (NULL, struct frame); \
> + struct frame *my_prev; \
> + \
> + if (! next) \
> + DO_OOM_ABORT (); \
> + \
> + next->prev = frame; \
> + frame = next; \
> + \
> + frame->iter = iter; \
> + frame->child = child; \
> + frame->ret = ret; \
> + frame->old_pos = old_pos; \
> + frame->old_len = old_len; \
> + frame->times = times; \
> + frame->parent = parent; \
> + frame->parent_ref = parent_ref; \
> + frame->symbol = symbol; \
> + frame->success = success; \
> + frame->tail = tail; \
> + frame->alt_matched = alt_matched; \
> + frame->retpoint = &&retpoint; \
> + \
> + parent = (parent_new); \
> + parent_ref = (parent_ref_new); \
> + symbol = (symbol_new); \
> + ret = (ret_new); \
> + nth_time = (nth_new); \
> + \
> + goto begin; \
> + \
> +retpoint: \
> + my_prev = frame->prev; \
> + nih_free (frame); \
> + frame = my_prev; \
> + \
> + (got_returned); \
> +})
> +
UGH.
> +#define DO_RETURN(x) ({ \
> + NihParseNode *my_ret = (x); \
> + \
> + if (! frame) { \
> + if (! my_ret) \
> + nih_error_raise_printf (NIH_PARSE_FAILED, \
> + _(NIH_PARSE_FAILED_STR), \
> + "symbol"); \
> + else if (alloc_parent) \
> + nih_ref (my_ret, alloc_parent); \
> + \
> + return my_ret; \
> + } \
> + \
> + got_returned = my_ret; \
> + iter = frame->iter; \
> + child = frame->child; \
> + ret = frame->ret; \
> + old_pos = frame->old_pos; \
> + old_len = frame->old_len; \
> + times = frame->times; \
> + parent = frame->parent; \
> + parent_ref = frame->parent_ref; \
> + symbol = frame->symbol; \
> + tail = frame->tail; \
> + success = frame->success; \
> + alt_matched = frame->alt_matched; \
> + goto *frame->retpoint; \
> +})
> +
UGH.
Sorry, I really don't like these code constructs as macros. It makes
your code nearly impossible to read.
> +begin:
>
Is this a loop? Then use a loop, not goto!
> + NihParseNode *term =
> + nih_parse_terminal_symbol_matches (parent, symbol,
> + text_pos,
> + text_end - text_pos,
> + &oom_flag,
> + nth_time,
> + parent_ref);
> +
Mid-block declarations are forbidden.
> + if (term)
> + text_pos += symbol->length;
> + else if (oom_flag)
> + DO_OOM_ABORT ();
> +
Braces around if/else. Always.
> + if (child->flags & NIH_PARSE_NOT ||
> + times < child->min_times ||
> + child->flags & NIH_PARSE_LOOKAHEAD)
> + text_pos = old_pos;
> +
>
Parens.
> +backtrack:
>
No.
I found this code really hard to follow; it needs a massive amount of
comments to be understood. Frankly, it needs restructuring.
> +/**
> + * nih_parse_terminal_symbol_matches:
> + * @parent: Alloc parent,
> + * @symbol: symbol to check,
> + * @text_pos: position in text to check,
> + * @text_len: length of text after @text_pos,
> + * @oom_flag: int to set to 1 if an OOM error occurs,
> + * @nth_time: this is the nth time this symbol has been matched,
> + * @parent_ref: reference from parent node to the node to be created.
> + *
> + * Check to see if @symbol matches at @text_pos, where @symbol is a terminal
> + * symbol.
> + *
> + * Returns: A new parse node or NULL on no match or OOM.
> + **/
>
NULL on "no match" *OR* OOM just isn't acceptable.
Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/upstart-devel/attachments/20100210/217314d1/attachment.pgp
More information about the upstart-devel
mailing list