Illegal/malformed utf-8 exception with input file encoded as utf-8

Description

Greetings,

I am new to Logstash but have been working with it on a project for the last month or so. We encountered no issues in our setup through development and testing but when we put it into production we started seeing this error crashing our Logstash instance. The file it is reading (attached gzipped example) is encoded as utf-8 as near as I can tell yet we get this error on a daily basis. When the error occurs the logstash process hangs as the sole output thread gets stuck trying to flush the events. It retries the flush every second, with the same failed result each time, until restarted.

Let me know if there is anything else you need or if I can help out with this. I've no experience with Ruby but am not adverse to diving in if it will help.

Environment
Virtual server with 4-cpu, 16GB RAM, 160GB Disk
Centos 5 x64
Tomcat 6 running a Grails server app (no UI) generating the log files
Java 1.6.0_06
Supervisord running the process (here is the command):

command=java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=XXXX -Dcom.sun.management.jmxremote.password.file=XXXX -Dcom.sun.management.jmxremote.access.file=XXXX -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Xmx256M -Xms256M -jar /data/app/logstash/logstash.jar agent --log /data/app/logstash/logs/logstash.log -f /data/app/logstash/conf.d/

Error
The specific error is:

{:timestamp=>"2014-02-06T21:18:00.777000-0500", :message=>"Failed to flush outgoing items", :outgoing_count=>100, :exception=>#<JSON::GeneratorError: source sequence is illegal/malformed utf-8>, :backtrace=>["json/ext/GeneratorMethods.java:71:in `to_json'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/logstash/event.rb:168:in `to_json'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/logstash/outputs/elasticsearch.rb:322:in `flush'", "org/jruby/RubyArray.java:1613:in `each'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/logstash/outputs/elasticsearch.rb:310:in `flush'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1338:in `each'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/stud/buffer.rb:216:in `buffer_flush'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/stud/buffer.rb:193:in `buffer_flush'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/stud/buffer.rb:159:in `buffer_receive'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/logstash/outputs/elasticsearch.rb:305:in `receive'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/logstash/outputs/base.rb:86:in `handle'", "(eval):82:in `initialize'", "org/jruby/RubyProc.java:271:in `call'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/logstash/pipeline.rb:259:in `output'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/logstash/pipeline.rb:218:in `outputworker'", "file:/data/app/logstash/logstash-1.3.2-flatjar.jar!/logstash/pipeline.rb:145:in `start_outputs'"], :level=>:warn}

Configuration
In our original configuration we were pushing events to redis (for operational logs) and separate events (custom server call log) direct to Elasticsearch. We first saw the utf-8 error in connection with the Redis output so we removed that. We then started seeing it for the call log, which is the main log we were interested in in the first place.

Our configuration is broken up into two files (input/filter and output), though we are only using one of the created 'types' in the output:

Input and Filter:
http://pastebin.com/GE64N6hp

Output:
http://pastebin.com/fzTfdTVL

Status

Assignee

Logstash Developers

Reporter

Lon Pilot

Affects versions

Configure