Source sequence is illegal/malformed utf-8
Description
Attachments
Gliffy Diagrams
Activity

Miral Popat February 10, 2014 at 10:20 PM
Where should I add this? I am using file input and elastic search...

Geoff Meakin October 9, 2013 at 5:06 PM
Simply wrapping a begin/rescue/end around to_json in logstash/event.rb fixed it for me. For bonus points I tried something like this -
public
def to_json(*args)
begin
return @data.to_json(*args)
rescue GeneratorError
return JSON.parse("{ \"unrecognised_encoding\": \"#{@data.bytes.to_a.collect { |char| char.to_s(16) } }\" }").to_json(*args)
end
end # def to_json
I didn't get what I expect, but at least the logstash server doesn't crash any more. Maybe this helps someone

Geoff Meakin October 9, 2013 at 4:36 PM
I noticed that I could crash logstash by just telnetting to it.
It's due to it not being able to recognise encoding like this bug..
I think it's clearly more desirable to drop the event (or perhaps log the raw bytes as a message to prevent data loss)... but whatever the case, a logstash service should not be that brittle. I'll have to degrade to logstash <1.2 until this is fixed.
Jordan Sissel September 9, 2013 at 2:48 PM
Ideally logstash shouldn't crash, but I don't know what the alternative is. Some options:
logstash could drop the event, but that's not ideal (data loss).
logstash could try and force conversion to UTF-8 (causing data loss).
This error occurs because logstash receives data it expects to be UTF-8 but it is not. JSON is required to be valid UTF-8, and so when logstash tries to output the events, it gets this error.
If you know the character encoding of your text, you can set the 'charset' setting in the codec of your input plugin.
Based on your log, I wonder if your log is actually encoded using ISO-8859-1 (Latin-1) and not UTF-8?
For example; if you are reading from a file:
Try this and see if it helps.

Valentino Gagliardi September 7, 2013 at 6:51 PM
I've found another case that make Logstash crash.
"source sequence is illegal/malformed utf-8" when Logstash encounter messages like:
@data={"message"=>"Sep 7 20:08:26 server exim_rejectlog 2013-09-01 09:27:00 H=(user-\xCF\xCA) [xx.xx.xx.xx]
Perhaps in this log the malformed string can be \xCF\xCA
Regards.
Details
Details
Assignee
Reporter

Hello
sometimes the Logstash agent die with error "source sequence is illegal/malformed utf-8".
I turned on debug mode so I've managed to find what characters make Logstash crash. Here is an attachment.
Regards.