Support more than one charset per input/codec
Description
Gliffy Diagrams
Activity
Show:
Jason Kendall April 17, 2014 at 6:29 PM
I understand the thought process. The correct thing isn't even fixing the systems, but use multiple ports. One for each type of system. This also gives the added bonus of setting the type, and parsing the strings with only the GROK patterns required, thus, not causing un-needed churn/load on the system.
Details
Details
Assignee
Logstash Developers
Logstash DevelopersReporter
Stefan Förster
Stefan FörsterCreated April 7, 2014 at 4:46 AM
Updated April 17, 2014 at 6:30 PM
Sometimes, a single input stream might contain logs with more than one encoding. Scenarios I've seen so far are e.g. a TCP input channel for syslog, used by systems with different locale settings, and application logfiles that tend to dump user-input into logfiles without taking care to properly re-encode it.
In these cases, while the correct[tm] thing to do would be to fix the systems or applications in question, it would be useful if we could specify an array of valid encodings (such as encoding => ["UTF-8", "ISO8859-15"]) which are then tried sequentially.
Another filter where this would be useful would be "date", where a setting like 'locale => ["C", "de"]' might be handy.