Create a grok patterns collection repository

Description

I found logstash one month ago and I started to use it in my infrastructure, however I think that in some way it misses a tool to analyze common program log files and extract useful information from there. For instance I need it to analyze some Postfix smtp logs and I found almost nothing to start with.

I found this repository and this folder which contain some generic patterns but imho logstash should provide a pattern repository with all major logfiles formats covered and most important with specs which can prove the correct behavior of the patterns.

In this way a new user can simply configure it with pattern type and he is immediately able to search its logs for useful data (in the case of smtp logs above one can query ES with a query like to:"user@example.com").

I forked an existing Github repository which contains a similar attempt and added my postfix experiments to that, the result is here:

https://github.com/fabn/logstash-patterns

I hope this idea will be considered since with the community support and with support of a tool like RSpec for testing a such repository can grow in little time.

Let me know your opinion on this subject.

Activity

Show:
Fabio Napoleoni
October 16, 2012, 3:34 PM

Possibly related

Jordan Sissel
October 16, 2012, 6:25 PM

"all major logfiles formats covered"

I think you gravely underestimate the massive and overwhelming number of different and insane log formats

That said, there's no need to fork a repository just to maintain a separate grok patterns repo. If you have patterns that are useful to others, I very much welcome your contributions and patches - send patches, not forks!

On testing, grok already has a fleet of tests but they're mainly for individual small patterns like IP and NUMBER - The grok test (spec/filters/grok.rb) is the right place to do these tests for now, so if you want to contribute tests that's the place to work and send patches

Aaron Mildenstein
October 16, 2012, 8:49 PM

Or just google for them. I sometimes post stuff on my blog, and I know others do too.

Fabio Napoleoni
October 17, 2012, 10:56 AM

@Jordan about the high number of different log formats, that's true but it could be possible to start with popular softwares (categories) such as

Web servers: apache, nginx
Mail servers: postfix, sendmail, qmail
Database: MySQL, PostgreSQL
... and so on

Obviously the list is not exhaustive and it could grow in time and everyone can contribute if everything is kept in a central place (i.e. logstash GH repository)

About the repository comment, I suggested to do that in a separate repository because it's easier to get just a folder of patterns, improve them and run specs only for that without having all logstash dependencies installed. Take a look at

https://github.com/logstash/logstash/blob/master/logstash.gemspec#L19

and then at

https://github.com/fabn/logstash-patterns/blob/develop/Gemfile

If someone wants to contribute with some patterns only needs a couple of gems, not the whole set of logstash dependencies

@Aaron I've posted this feature request because I think that a centralized place for all those stuff, a blog post with patterns is not very useful without being referenced and without tests. What if I find a mistake in your post, should I report it as a comment? Then a casual visitor use it without seeing my comment and the pattern won't work.

And, for the records, I've googled for postfix patterns and found almost nothing, just one Gist with a proof of concept which had a subtle issue. That's why I created my repository with specs which allows everyone to take that pattern, test it and eventually improve it with no risk of breaking existing functionalities.

Jordan Sissel
April 20, 2013, 6:33 AM

There are a few places to contribute:

Specific grok patterns, I welcome contributions to the logstash repo itself in the 'patterns' directory.

Log formats generally are more than just grok, so configs for specific lot formats are welcome contributions to the logstash cookbook (cookbook.logstash.net)

Assignee

Jordan Sissel

Reporter

Fabio Napoleoni

Labels

Configure