On my instance of Logstash, when I use Grok field type conversion (to integer), performances fall to an unusable rate.
I have done some benchmark on HAProxy logs, with HAPROXYHTTP pattern + type conversion to integer (12 integer fields).
On my computer, without type conversion, it processes 10 000 lines in less than 3 seconds; with type conversion it processes the same lines in about 24 seconds.
I understand, type conversion might add some overhead but here, the conversion takes more time than the actual field extraction !
This is fascinating and bears further research, even if only to understand how that's affecting performance and to improve logstash.
I have to ask, though, why aren't you using an elasticsearch template/mapping to do the "conversion" from string to integer? It's pretty straightforward, and the conversion is then handled external to logstash. Besides integer, you can also get elasticsearch to recognize float and IP types (which is really useful for being able to search through IP ranges).
It is exactly what I have done to fix this issue in my case
I have used Logstash type conversion because I had only one place to declare extracted fields and their type. With an Elasticsearch template, I have to declare fields name in Logstash config file and redeclare these field along their type into an ES index template, it is just a little less convenient.
Grok performance has been dramatically improved:
If you have other grok issues, please open an issue at https://github.com/logstash-plugins/logstash-filter-grok