Grok apache access log filter fails for non-ascii characters.


When parsing a request such as , the request fails due to what was essentially non ascii characters before being handled by the browser/apache.

Another example at


February 9, 2012, 6:44 PM

Two fixes were pretty simple. Include '\' and '|' into URIPATH making it look like:
URIPATH (?:/[A-Za-z0-9$.!|

Which solves ~60% of the parse errors I get. However, some strings look like this: - - [09/Feb/2012:17:32:31 +0100] "GET / HTTP/1.1" 505 548 "http://host/?search=bageri?&f=a" "SomeAgent" 0

Grok fails to parse this because of the second '?' in the referrer. Now, this breaks the URI syntax (, but since referrer is handled by the browser, it can basically contain any string. And in my case, quite a few referrer headers are malformed.

February 9, 2012, 7:00 PM

In 1.1.0 and 1.0.17 the

But according to Jordan, it should be:

19:48 < whack> TommyBotten: the pattern already does, it'll try URI or quoted string if URI fails
19:48 < whack> oh wait, I swear I fixed that
19:49 < whack> "(?:%{URI:referrer}|-)" should really be (?:%{URI:referrer}|%{QS:referrer})

That, however does not work for: - - [09/Feb/2012:17:32:31 +0100] "GET / HTTP/1.1" 505 548 "http://host/?q=test?&f" "Agent" 0

But using only %{QS:referrer} works. And whould be sufficient, given apaches combined_log format.

February 9, 2012, 7:14 PM

Another one which is not following the standard, but should might be considerated: - - [09/Feb/2012:19:09:14 +0100] "GET /vhostwebtype=s HTTP/1.1" 404 2339 "" "Agent" 0

It does GET /somestring=foobar, whereas it should have been GET /?somestring=foobar . I have seen this elsewhere as well.

louis z
January 3, 2013, 5:37 AM

I think the grok patterns have been changed to allow for nonconforming strings where URIs should be... NOTSPACE for the resource, and QS for the referrer.


Logstash Developers




Fix versions

Affects versions