Grok apache access log filter fails for non-ascii characters.

Description

When parsing a request such as http://pastebin.com/9Tj2dRJt , the request fails due to what was essentially non ascii characters before being handled by the browser/apache.

Another example at http://pastebin.com/tp1yRq72

Activity

Show:
Tommy
February 9, 2012, 6:44 PM

Two fixes were pretty simple. Include '\' and '|' into URIPATH making it look like:
URIPATH (?:/[A-Za-z0-9$.!|
'(),~:#%_-])

Which solves ~60% of the parse errors I get. However, some strings look like this:

127.0.0.1 - - [09/Feb/2012:17:32:31 +0100] "GET / HTTP/1.1" 505 548 "http://host/?search=bageri?&f=a" "SomeAgent" 0

Grok fails to parse this because of the second '?' in the referrer. Now, this breaks the URI syntax (http://www.ietf.org/rfc/rfc2396), but since referrer is handled by the browser, it can basically contain any string. And in my case, quite a few referrer headers are malformed.

Tommy
February 9, 2012, 7:00 PM

In 1.1.0 and 1.0.17 the

But according to Jordan, it should be:

19:48 < whack> TommyBotten: the pattern already does, it'll try URI or quoted string if URI fails
19:48 < whack> oh wait, I swear I fixed that
19:49 < whack> "(?:%{URI:referrer}|-)" should really be (?:%{URI:referrer}|%{QS:referrer})

That, however does not work for:

127.0.0.1 - - [09/Feb/2012:17:32:31 +0100] "GET / HTTP/1.1" 505 548 "http://host/?q=test?&f" "Agent" 0

But using only %{QS:referrer} works. And whould be sufficient, given apaches combined_log format.

Tommy
February 9, 2012, 7:14 PM

Another one which is not following the standard, but should might be considerated:

127.0.0.1 - - [09/Feb/2012:19:09:14 +0100] "GET /vhostwebtype=s HTTP/1.1" 404 2339 "http://evilspurv.net/" "Agent" 0

It does GET /somestring=foobar, whereas it should have been GET /?somestring=foobar . I have seen this elsewhere as well.

louis z
January 3, 2013, 5:37 AM

I think the grok patterns have been changed to allow for nonconforming strings where URIs should be... NOTSPACE for the resource, and QS for the referrer.

Assignee

Logstash Developers

Reporter

Tommy

Labels

Fix versions

Affects versions

Configure