Source sequence is illegal/malformed utf-8

Description

Hello
sometimes the Logstash agent die with error "source sequence is illegal/malformed utf-8".

I turned on debug mode so I've managed to find what characters make Logstash crash. Here is an attachment.

Regards.

Activity

Show:
Valentino Gagliardi
September 7, 2013, 6:51 PM

I've found another case that make Logstash crash.

"source sequence is illegal/malformed utf-8" when Logstash encounter messages like:

@data={"message"=>"Sep 7 20:08:26 server exim_rejectlog 2013-09-01 09:27:00 H=(user-\xCF\xCA) [xx.xx.xx.xx]

Perhaps in this log the malformed string can be \xCF\xCA

Regards.

Jordan Sissel
September 9, 2013, 2:48 PM

Ideally logstash shouldn't crash, but I don't know what the alternative is. Some options:

  • logstash could drop the event, but that's not ideal (data loss).

  • logstash could try and force conversion to UTF-8 (causing data loss).

This error occurs because logstash receives data it expects to be UTF-8 but it is not. JSON is required to be valid UTF-8, and so when logstash tries to output the events, it gets this error.

If you know the character encoding of your text, you can set the 'charset' setting in the codec of your input plugin.

Based on your log, I wonder if your log is actually encoded using ISO-8859-1 (Latin-1) and not UTF-8?

For example; if you are reading from a file:

Try this and see if it helps.

Geoff Meakin
October 9, 2013, 4:36 PM

I noticed that I could crash logstash by just telnetting to it.

It's due to it not being able to recognise encoding like this bug..

I think it's clearly more desirable to drop the event (or perhaps log the raw bytes as a message to prevent data loss)... but whatever the case, a logstash service should not be that brittle. I'll have to degrade to logstash <1.2 until this is fixed.

Geoff Meakin
October 9, 2013, 5:06 PM

Simply wrapping a begin/rescue/end around to_json in logstash/event.rb fixed it for me. For bonus points I tried something like this -

public
def to_json(*args)
begin
return @data.to_json(*args)
rescue GeneratorError
return JSON.parse("{ \"unrecognised_encoding\": \"#{@data.bytes.to_a.collect { |char| char.to_s(16) } }\" }").to_json(*args)
end
end # def to_json

I didn't get what I expect, but at least the logstash server doesn't crash any more. Maybe this helps someone

Miral Popat
February 10, 2014, 10:20 PM

Where should I add this? I am using file input and elastic search...

Assignee

Jordan Sissel

Reporter

Valentino Gagliardi

Affects versions

Configure