logstash indexer hangs and eventually goes OOM on "malformed utf-8" error


Upon getting the malformed utf-8 error, the logstash agents hangs, heap memory starts to rise and eventually goes OOM.

Below message gets output in endless loop.
{:timestamp=>"2013-09-22T07:59:39.690000+0000", :message=>"Failed to flush outgoing items", :outgoing_count=>100, :exception=>#<JSON::GeneratorError: source sequence is illegal/malformed utf-8>, :backtrace=>["json/ext/GeneratorMethods.java:71:in `to_json'", "file:/opt/logstash-1.2.1-monolithic.jar!/logstash/event.rb:169:in `to_json'", "file:/opt/logstash-1.2.1-monolithic.jar!/logstash/outputs/elasticsearch.rb:163:in `flush'", "org/jruby/RubyArray.java:1617:in `each'", "file:/opt/logstash-1.2.1-monolithic.jar!/logstash/outputs/elasticsearch.rb:158:in `flush'", "jar:file:/opt/logstash-1.2.1-monolithic.jar!/gems/stud-0.0.17/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1332:in `each'", "jar:file:/opt/logstash-1.2.1-monolithic.jar!/gems/stud-0.0.17/lib/stud/buffer.rb:216:in `buffer_flush'", "jar:file:/opt/logstash-1.2.1-monolithic.jar!/gems/stud-0.0.17/lib/stud/buffer.rb:193:in `buffer_flush'", "jar:file:/opt/logstash-1.2.1-monolithic.jar!/gems/stud-0.0.17/lib/stud/buffer.rb:159:in `buffer_receive'", "file:/opt/logstash-1.2.1-monolithic.jar!/logstash/outputs/elasticsearch.rb:153:in `receive'", "(eval):115:in `initialize'", "org/jruby/RubyProc.java:255:in `call'", "file:/opt/logstash-1.2.1-monolithic.jar!/logstash/pipeline.rb:247:in `output'", "file:/opt/logstash-1.2.1-monolithic.jar!/logstash/pipeline.rb:212:in `outputworker'", "file:/opt/logstash-1.2.1-monolithic.jar!/logstash/pipeline.rb:140:in `start_outputs'"], :level=>:warn}


Sunny Kim
September 22, 2013, 4:53 PM

Is there a way for Logstash to force-remove the offending characters?

I have tried adding ' codec => plain { charset => "US-ASCII" } ' but it doesn't seem to be helping. But then I may be failing to add this on some of the machines. Is this the correct fix to prevent logstash to fail?

Sunny Kim
September 23, 2013, 7:28 AM

Just to provide a bit more info.... my current setup:

shipping agents (only input and output) setup on 100s of machines.

All of the inputs generally look like this:
file {
type => "apache-access"
path => "/var/log/apache2/access.log"
codec => plain { charset => "US-ASCII" }

All of the outputs generally look like this:
output {
rabbitmq {
host => "rabbitmq-lb.myhost.com"
user => "admin"
password => "mypassword"
exchange => "logstash-exchange"
exchange_type => "direct"
key => "logstash-routing-key"
durable => true
persistent => true
All shipping agents ship logs to rabbitmq
I have a cluster of logstash-indexers where input is rabbitmq, performs a bunch of filters (grok, mutate, etc) and outputs to elasticsearch. And all of these indexers hang from malformed utf-8 errors. Please note that this never happened on 1.1.9.

Personally for me, the best resolution is to just drop all offending characters. But I understand this may be undesirable for some people. But the worst case scenario is what I'm seeing now, where logstash just hangs and eventually goes OOM.

Thanks! But all of this aside, I'm still happy about the 1.2.1 upgrade. It's super fast! But I do need a workaround for this problem.


Sunny Kim
September 24, 2013, 2:13 AM

If anyone else out there experiences similar problem, here's a little perl hack I wrote to restart logstash agent when it's on the verge of hanging. It uses jstat to monitor garbage collection and restarts it when usage of Survivor0 or Survivor1 is at above 99% for 10 consecutive seconds. Here it is: http://pastebin.com/ra89kbrR

Bernd Ahlers
October 3, 2013, 5:05 PM

I think you have to find out in which charset your apache logs are encoded in. I was able to reproduce your error with a "ISO-8859-1" encoded file. The following config made it work.


Logstash Developers


Sunny Kim

Affects versions