DATE_EU grok pattern fails to always match 4 digit year

Description

Hi and thanks for reading. Using the grok debugger, I've found what I think is a bug with the DATE_EU regex. Let's consider the following input:

Using the %{YEAR} pattern, the output is "2013" as expected, but when using %{DATE_EU} or %{DATESTAMP}, the year palceholder in the DATE_EU output is not "2013" as expected but instead "13" and the YEAR output is "15". See below:

If you look carefully, you'll notice that the DATE_EU's output has the year in the correct place but with only 2 digits instead of four, but the YEAR output is 15, which is actually the MONTHDAY, and the MONTHDAY output is 13.

But oddly, the output is correct when using the expanded version of ${DATE_EU} which is:

Output:

This could be a bug in the grok debugger, too. I'm not sure. Thanks for taking a look into this.

Activity

Show:
Tom Kapanka
December 27, 2013, 6:12 PM

Hi and happy holidays, thanks for the quick response.
Sorry, but I beg to differ since my pattern is YEAR-MONTHNUM-MONTHDAY (or YYYY-MM-DD). See:

Which is exactly what DATE_EU promises:

and not what you suggest above:

I double-checked against the repository since I was working out of my local puppet repo (I wanted to be sure I wasn't making an idiot out of myself!), but sure enough, master on github shows the YYYY-MM-DD pattern:

https://github.com/logstash/grok-patterns/blob/master/grok-patterns#L71

And YEAR promises to pick up 4 digits:

and this is exactly how YEAR works:

You may be correct that "most" Europeans might write out by hand the date as 26/12/13, which I am familiar with having lived and worked overseas, but if that was the case, the pattern promised by DATE_EU *should* be:

But that is not the pattern, and it's only when used in DATE_EU that the year is not four digits and the month/year values are incorrectly flipped:

Maybe DATE_EU should become:

and a new date pattern should be added:

...for dealing with the format of date most commonly found in logs.

Also, I agree with you that "TIMESTAMP_ISO8601" looks pretty good, but unfortunately Pylons wants to output its logs with an added ".###" after the milliseconds place, essentially doubling the milleseconds value for some reason(??).

So, the pattern I'm using now is:

Again, thanks for looking at this, it puzzles me as well, and really, it could be a bug in the way the Grok Debugger is parsing (since my guess is it's being done with JavaScript?), rather than the way the pattern is written. I'm going to check myself locally today to verify this hypothesis.

Jordan Sissel
December 28, 2013, 12:46 AM

Sorry, you've been mislead

The description for the 'logstash/grok-patterns' repo reads: "Grok patterns used by logstash (this is not maintained, the main repo is in logstash itself)"

I forgot that repo even existed until now. As of now, I have deleted this repo to avoid future confusion. The definition of DATE_EU in logstash 1.3.2 (and since 1.2.0) has been what I said - day/month/year - you can see the definition here: https://github.com/logstash/logstash/blob/master/patterns/grok-patterns#L64

Replace 'master' with v1.3.2 if you wish to view that file at a that specific logstash version.

Regarding grok debugger, it is implemented in Ruby and uses the same library as logstash to perform matches so in most cases things should line up. However, sometimes we are lax in updating the patterns list that grok-debug has.

Jordan Sissel
December 28, 2013, 4:55 AM

Again very sorry for this weird confusion. Can we get your more immediate problem solved? What about TIMESTAMP_ISO8601 doesn't work? I saw you mention that your time format seems to have milliseconds written twice, if so, that seems OK to me since you can just ignore the last part with regexp?

Tom Kapanka
December 30, 2013, 6:52 PM

Ah, well, that makes sense! Thanks for clearing that up.

As for my date, I can use: (?<pylonstime>%{TIMESTAMP_ISO8601}.\d+)

Good to know about the Grok Debugger, it's totally priceless. One nice-to-have, would be the ability to define your own patterns and then use them. For example, we have:

But I can't test it as-is because it doesn't understand my custom patterns, so I have to maintain a separate, expanded pattern to test it with:

It would be awesome to have an additional input field where you could put your own definitions to be later used in the pattern field.

Anyhow, I've take enough of your time. Thanks for clearing that up.

Tom Kapanka
December 30, 2013, 6:53 PM

This was from an unmaintained repository. Sorry for the confusion.

Assignee

Logstash Developers

Reporter

Tom Kapanka

Labels

Affects versions

Configure