Watchdog no longer functions in logstash 1.2.1 (and 1.2.0)

Description

Logstash filter worker will stop processing messages even when there are messages available.

Logstash is getting input from a RabbitMQ queue using SSL, filtering the content, and outputting back to another RabbitMQ queue, again with SSL.

There are no errors visible in the output, even using -vv cmdline switch.

Once the queue has hung, it's not possible to shut down the pipeline with ctrl-C - the Logstash process must be killed to stop it.

What's the next best step to identify the cause of the problem?

Activity

Show:
Chris Koutsos
October 1, 2013, 12:19 AM
Edited

Hi there,

Ran with only stdout configured for about an hour and a half but then hung, again on a postfix message (all the hangs have been on postfix messages - not sure if this is significant).

logstash.conf: http://pastebin.com/DWxL0iJb
Patterns: http://pastebin.com/xQtAtCXJ

I'm checking queue sizes with the default 'rabbit.rb' which comes as a Graphital sample script. This is just running 'rabbitmqctl list_queues' periodically and sending the results to Graphite.

This problem has only occurred since upgrading to 1.2+

Cheers,
CK

Chris Koutsos
October 7, 2013, 11:57 PM

I think I've narrowed this down to a problem with grok + my config. If I remove the postfix section of the config and let the messages pass without processing, I don't get any hanging.

Any more ideas?

Chris Koutsos
October 8, 2013, 10:29 PM

OK, I've now run this for 24 hours without the postfix part of the config with no issues at all. Is there anything in there which looks like it could cause an issue, or is the problem with grok or one of the other filters?

Jordan Sissel
October 10, 2013, 10:50 PM

After some discussion in #logstash; we identified that the grok filter is what gets stuck. Looks like the filter watchdog was accidentally disabled in 1.2.0 which is why logstash hangs instead of crashing.

For clarity, the "hang" is due to bugs in the Ruby regexp engine where you get exponential execution time in certain conditions. You can fix this by modifying your pattern to avoid this behavior, but the fix can sometimes be tricky.

Chris Koutsos
October 15, 2013, 8:43 PM

Thanks for the pointer. I've found my misbehaving pattern and modified it, and all seems stable.

Assignee

Logstash Developers

Reporter

Chris Koutsos

Labels

None

Affects versions

Configure