Failing to decode base64 string

Description

Trying this on daily build 1.4.0-dev downloaded on Jan 16.

Configuration: awesant -> redis -> logstash -> es

I'm parsing mail logs, and we also log the mail subject for easier search for an email. The subject is sometimes not encoded (in latin) and it gets filtered to mail_subject. When the subject is encoded ( looks like =?UTF-8?B?base64_string?= ) i put the base64_string in mail_subject_b64 and then try to decode it.

I haven't found any easy way to decode a base64 field - the 'cipher' filter kinda does it, but only if you're decoding/encoding something. So i ended up using a ruby filter like this:

if [mail_subject_b64] {
ruby {
init => "require 'base64'"
code => "event['mail_subject'] = Base64.decode64(event['mail_subject_b64']) if event.include?('mail_subject_b64')"
}
}

Unfortunately, it doesn't work and gives warnings:
{:timestamp=>"2014-01-16T13:05:35.953000+0000", :message=>"Failed to flush outgoing items", :outgoing_count=>127, :exception=>#<Encoding::UndefinedConversionError: ""\xE2"" from ASCII-8BIT to UTF-8>, :backtrace=>["org/jruby/RubyString.java:7571:in `encode'", "json/ext/GeneratorMethods.java:71:in `to_json'", "file:/home/logstash/logstash-1.4.0.dev-flatjar.jar!/logstash/event.rb:168:in `to_json'", "file:/home/logstash/logstash-1.4.0.dev-flatjar.jar!/logstash/outputs/elasticsearch.rb:322:in `flush'", "org/jruby/RubyArray.java:1613:in `each'", "file:/home/logstash/logstash-1.4.0.dev-flatjar.jar!/logstash/outputs/elasticsearch.rb:310:in `flush'", "file:/home/logstash/logstash-1.4.0.dev-flatjar.jar!/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1338:in `each'", "file:/home/logstash/logstash-1.4.0.dev-flatjar.jar!/stud/buffer.rb:216:in `buffer_flush'", "file:/home/logstash/logstash-1.4.0.dev-flatjar.jar!/stud/buffer.rb:193:in `buffer_flush'", "file:/home/logstash/logstash-1.4.0.dev-flatjar.jar!/stud/buffer.rb:112:in `buffer_initialize'", "org/jruby/RubyKernel.java:1519:in `loop'", "file:/home/logstash/logstash-1.4.0.dev-flatjar.jar!/stud/buffer.rb:110:in `buffer_initialize'"], :level=>:warn}

I also tried adding .encode('UTF-8') and .force_encoding('UTF-8') to the decode64() but it also doesn't work (gives other errors).

Tried using the i18n/transliterate filter, but also no luck.

Any ideas on how to fix this / how to make it work nicely? I'm out of ideas already...

Activity

Show:
Jordan Sissel
February 25, 2014, 1:01 AM

The problem here is that your ruby code is storing a non-UTF-8-encoded Ruby String in a field. This is not valid.

You will want to probably use String#force_encoding; for example:

Jordan Sissel
February 25, 2014, 1:03 AM

I don't think this is a bug, so I'm closing this. Here's some details which should be useful I hope!

Using the ruby filter you have to be careful about how you use created strings.

Base64.decode64 always returns strings with encoding ASCII-8BIT even if their orginal encoding was not. Encoding is a property of the string object, not of the content of the string itself! It's tricky.

Here's another example showing what I mean - Let's encode and decode the unicode frownyface ☹ - I will run this through the irb shell; >> means I typed text, => means result.

Hope this helps!

Assignee

Jordan Sissel

Reporter

Aleksandr Stankevic

Labels

Affects versions

Configure