The Big UTF-8 Ticket

Description

Users continue to have UTF-8 problems.

Current solutions:

  • users can set the correct charset at input-time for their data

Current problems:

  • logstash crashes sometimes due to users submitting data that they promise (as configured through logstash) is UTF-8 but it is not.

Before discussing options, I want to make clear that we want the following two properties to be available:

  • we should avoid default configurations that permit data loss or data corruption

  • we should avoid solutions to this encoding problem that, by default, cause performance problems.

Possible solutions:

  • Validate all input as being valid UTF-8, and if not, try alternate charsets? This has both performance costs as well as causing corruption if we pick the wrong charset.

  • Log the invalid data, but otherwise drop the event. This is also bad because dropping data is bad.

  • Continue to crash. This is bad because crashing is bad.

Any other options? All 3 solutions I can think of all drop or corrupt data.

33% Done
Loading...

Gliffy Diagrams

Activity

Show:

Ryan Bellows April 24, 2014 at 10:37 PM

this is in 1.4.0

Jordan Sissel April 24, 2014 at 10:32 PM

Most of the UTF-8 problems known were fixed in 1.4.0. I do recommend you filing a separate bug any time you find issues so we can more adequately assess and respond to them

Ryan Bellows April 24, 2014 at 8:35 PM

Any update on this issue? I have a simple file input that crashes until I wipe out the sincedb file.

{:timestamp=>"2014-04-24T20:33:52.729000+0000", :message=>"A plugin had an unrecoverable error. Will restart this plugin.\n Plugin: <LogStash::Inputs::File type=>\"sfo1-php-errors\", path=>[\"/mnt/codebase/log/php_errors.log\"], sincedb_path=>\"/var/tmp/sfo1-logstash-php-errors.sincedb\", start_position=>\"end\">\n Error: invalid byte sequence in UTF-8", :level=>:error}

Miral Popat February 10, 2014 at 10:33 PM

I get error utf-8 logstash invalid byte sequence in UTF-8

Miral Popat February 10, 2014 at 10:30 PM

When is this issue likely to fix, I am using 1.3.3?

Details

Assignee

Reporter

Labels

Created October 8, 2013 at 12:04 AM
Updated March 20, 2015 at 7:11 AM