Grok fields type conversion is slow

Description

On my instance of Logstash, when I use Grok field type conversion (to integer), performances fall to an unusable rate.

I have done some benchmark on HAProxy logs, with HAPROXYHTTP pattern + type conversion to integer (12 integer fields).
On my computer, without type conversion, it processes 10 000 lines in less than 3 seconds; with type conversion it processes the same lines in about 24 seconds.

I understand, type conversion might add some overhead but here, the conversion takes more time than the actual field extraction !

Activity

Show:
Aaron Mildenstein
October 22, 2013, 1:16 PM

This is fascinating and bears further research, even if only to understand how that's affecting performance and to improve logstash.

I have to ask, though, why aren't you using an elasticsearch template/mapping to do the "conversion" from string to integer? It's pretty straightforward, and the conversion is then handled external to logstash. Besides integer, you can also get elasticsearch to recognize float and IP types (which is really useful for being able to search through IP ranges).

David Rousselie
October 22, 2013, 1:47 PM

It is exactly what I have done to fix this issue in my case

I have used Logstash type conversion because I had only one place to declare extracted fields and their type. With an Elasticsearch template, I have to declare fields name in Logstash config file and redeclare these field along their type into an ES index template, it is just a little less convenient.

Aaron Mildenstein
February 6, 2015, 11:10 PM

Grok performance has been dramatically improved:
https://github.com/elasticsearch/logstash/pull/1657

If you have other grok issues, please open an issue at https://github.com/logstash-plugins/logstash-filter-grok

Assignee

Logstash Developers

Reporter

David Rousselie

Labels

Affects versions

Configure