ElasticSearch and (perhaps) Logstash became unresponsive after a few days of 500k+ daily log entries

Description

I have a dedicated machine that runs RabbitMQ and a dedicated machine that runs logstash and ElasticSearch (ES) embedded. I stream about 500-800k messages per day. I've only been running the system for a few days, but this morning the ES web site (:9292) was unresponsive (I got the HTML title but not the full page, and that was just on the /search url). When I checked rabbit it had 560k messages in the queue and counting. Unfortunately I didn't know what else to check so I restarted logstash (and ES) and after that it processed the rabbit queue.

Config: http://pastebin.com/FseHjyft
Partial top: http://pastebin.com/Lmvmg23J

Gliffy Diagrams

Activity

Show:

Richard Pijnenburg December 17, 2012 at 4:26 AM

Very old ticket.
If issue still persists with current version please open a new ticket.

Hakan Lindestaf September 16, 2011 at 8:40 PM

I found some errors in the log that indicated too many open files. So I found this link: http://www.elasticsearch.org/tutorials/2011/04/06/too-many-open-files.html

and my system was set to the default 1024. I changed it to 32000 and verified with ulimit, however when ES was running in embedded mode under logstash I still got the same behavior (I don't know a way to verify the file limit with the embedded ES).

I moved ES to running standalone and just having logstash pipe messages to it. Now it's working! But I still think this needs to be addressed in logstash.

Hakan Lindestaf September 14, 2011 at 10:00 PM

Now when I had the ElasticSearch log turned on I see some exceptions: http://pastebin.com/3KsUy9bh

Hakan Lindestaf September 13, 2011 at 11:06 PM

Got this again today. Here's the top output: http://pastebin.com/fQYvEAyT

I noticed it's actually processing, after waiting for a long time I get the full html of the search page, but it is extremely slow. Memory seems to be ok, but using a lot of cpu. I can see that it's processing the rabbit queue, but it can't keep up. So perhaps if I just let it sit it may completely freeze (like originally reported). Not sure what else I can add to this ticket as far as troubleshooting goes.

Fixed

Details

Assignee

Reporter

Fix versions

Affects versions

Created September 12, 2011 at 6:18 PM
Updated April 19, 2013 at 8:19 PM
Resolved December 17, 2012 at 4:26 AM