[apparmor] [Patch] Fix date time log parsing for 2.8.1

Tue Jan 8 19:39:18 UTC 2013

On 01/08/2013 10:56 AM, Seth Arnold wrote:
> On Tue, Jan 08, 2013 at 04:18:31AM -0800, John Johansen wrote:
>> The following patch extends the libraries log parsing to support more date
>> time formats.
> 
> I haven't tested the code but it reads very clearly. One slight concern
> lower:
> 
>>  /* syslog tokens */
>>  syslog_kernel		kernel{colon}
>> +syslog_yyyymmdd		{digit}{4}{minus}{digit}{2}{minus}{digit}{2}
>> +syslog_date		{syslog_yyyymmdd}
>>  syslog_month 		Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?
>> -syslog_time 		{digits}{digits}{colon}{digits}{digits}{colon}{digits}{digits}
>> +hhmmss			{digit}{2}{colon}{digit}{2}{colon}{digit}{2}
>> +timezone		({plus}|{minus}){digit}{2}{colon}{digit}{2}
>> +syslog_time 		{hhmmss}({period}{digits})?{timezone}?
>>  syslog_hostname		[[:alnum:]_-]+
>>  dmesg_timestamp		\[[[:digit:] ]{5,}\.[[:digit:]]{6,}\]
> 
>>  {syslog_kernel}		{ BEGIN(dmesg_timestamp); return(TOK_SYSLOG_KERNEL); }
>>  {syslog_month}		{ yylval->t_str = strdup(yytext); return(TOK_DATE_MONTH); }
>> -{syslog_time}		{ yylval->t_str = strdup(yytext); BEGIN(hostname); return(TOK_DATE_TIME); }
>> +{syslog_date}		{ yylval->t_str = strdup(yytext); return(TOK_DATE); }
>> +{syslog_date}T/{syslog_time}	{ yylval->t_str = strndup(yytext, strlen(yytext)-1); return(TOK_DATE); }
> 
> This introduces a trailing context with variable length; I couldn't
> verify from the flex docs if the performance problem with variable
> trailing context comes from not knowing the length of the leading and
> trailing portions or just not knowing the length of the trailing
> portion.
> 
> It's probably not a large concern, since parsing these correctly is more
> important than parsing them at high speed :) but if it does turn into a
> slow point, keep in mind that we can always re-write these rules to
> remove the variable length.
> 
variable length trailing context is a bit of a concern but not much. It
should only introduce a slow down when the leading context is matched and
the amount of slowdown really depends on input caching. Basically there
are two parts to the slow down
  1. the trailing context gets walked twice.  Once when doing the trailing
     context is matched, and then again after the head is rewound on the
     next match.
  2. the input head needs to be moved back, which means flex has to do
     stuff with input buffering, dynamic allocations etc.

As you noted our trailing context could be expressed as a static pattern,
and doing so would probably speed us up a little bit at the cost of a few
more rules.

I am not really worried though as the variable length trailing context does
have a finite maximum length that is fairly short, so even if flex has to
do some allocations for buffer management it should not be significant.