Filters using too much cpu

Description

I'm using a logstash shipper sendin logs to a Redis db as follow

output {
redis {
host => [ "X.X.X.X" ]
data_type => 'list'
key => 'haproxy'
batch => true
}
}

And an logstash indexer pulling logs from redis as follow:
input {
redis {
host => "127.0.0.1"
data_type => "list"
type => "haproxy"
key => "haproxy"
threads => 10
batch_count => 100
}
}

The problem is that logs are not pulled fast enough as their timestamp is getting further as time goes by (ie after 10 minutes, i'm only getting logs from 7 minutes ago).
it seems that it comes from the input as when i run "redis-cli monitor" the rpushed logs coming from the shipper are the ones with the latest/actual timestamp.
Not sure if I'm clear enough
In short there is a delay getting bigger and bigger between data pulled and data pushed.
For information there are around 4k/logs seconds.
Any help would be much appreciated
Thanks

Gliffy Diagrams

Activity

Show:

delahaye julien September 30, 2013 at 3:16 PM
Edited

It is far better, but I am still getting a lag (but it is a lot better compared to before). Guess I'll have to optimize my filters and maybe add an other instance of logstash pulling from redis

delahaye julien September 30, 2013 at 8:58 AM

This looks like exactly like what I needed! Works good, just need to check on longer term for now works like a charm.
I didn't see that so useful flag.
Thanks for the support (and this awesome tool )!

Jordan Sissel September 27, 2013 at 9:25 PM

A first step could be trying to run multiple filter workers, with the '-w' flag. Try '-w 4' to run 4 filter workers (the default is 1) which should increase your throughput a bit (also consumes more cpu)

delahaye julien September 27, 2013 at 8:26 AM
Edited

after investigating a bit more it seems it comes from Logstash filters eating too much CPU:
"You can try to figure out what's using the most cpu with the 'top' utility. If you find the logstash pid, run 'top -Hp logstash_pid' - the '-H' flag shows threads, and logstash labels most of its thread with names so you can see exactly what's doing what"
(Thanks Logstash users Google group !)
What advise would you have for optimising grok? Is there a type of pattern the uses more cpu than an other (GREEDYDATA, etc...) ?

delahaye julien September 25, 2013 at 8:05 AM

Hi,
first, thanks for taking some time to answer me.

Yes the output is Elasticsearch, but when I add the file output the timestamp is the same as it is in Elasticsearch (getting older and older).

Cheers

Details

Assignee

Reporter

Labels

Affects versions

Created September 24, 2013 at 3:08 PM
Updated September 30, 2013 at 3:16 PM