Watchdog no longer functions in logstash 1.2.1 (and 1.2.0)

Description

Logstash filter worker will stop processing messages even when there are messages available.

Logstash is getting input from a RabbitMQ queue using SSL, filtering the content, and outputting back to another RabbitMQ queue, again with SSL.

There are no errors visible in the output, even using -vv cmdline switch.

Once the queue has hung, it's not possible to shut down the pipeline with ctrl-C - the Logstash process must be killed to stop it.

What's the next best step to identify the cause of the problem?

Gliffy Diagrams

Activity

Show:

Chris Koutsos October 15, 2013 at 8:43 PM

Thanks for the pointer. I've found my misbehaving pattern and modified it, and all seems stable.

Jordan Sissel October 10, 2013 at 10:50 PM

After some discussion in #logstash; we identified that the grok filter is what gets stuck. Looks like the filter watchdog was accidentally disabled in 1.2.0 which is why logstash hangs instead of crashing.

For clarity, the "hang" is due to bugs in the Ruby regexp engine where you get exponential execution time in certain conditions. You can fix this by modifying your pattern to avoid this behavior, but the fix can sometimes be tricky.

Chris Koutsos October 8, 2013 at 10:29 PM

OK, I've now run this for 24 hours without the postfix part of the config with no issues at all. Is there anything in there which looks like it could cause an issue, or is the problem with grok or one of the other filters?

Chris Koutsos October 7, 2013 at 11:57 PM

I think I've narrowed this down to a problem with grok + my config. If I remove the postfix section of the config and let the messages pass without processing, I don't get any hanging.

Any more ideas?

Chris Koutsos October 1, 2013 at 12:19 AM
Edited

Hi there,

Ran with only stdout configured for about an hour and a half but then hung, again on a postfix message (all the hangs have been on postfix messages - not sure if this is significant).

logstash.conf: http://pastebin.com/DWxL0iJb
Patterns: http://pastebin.com/xQtAtCXJ

I'm checking queue sizes with the default 'rabbit.rb' which comes as a Graphital sample script. This is just running 'rabbitmqctl list_queues' periodically and sending the results to Graphite.

This problem has only occurred since upgrading to 1.2+

Cheers,
CK

Details

Assignee

Reporter

Affects versions

Created September 25, 2013 at 1:35 AM
Updated April 18, 2014 at 1:45 PM