fingerprint punctuation doesn't just leave punctuation

Description

The PUNCTUATION method of the fingerprint filter isn't described on the fingerprint docs, but the behaviour matches the old punct filter (which is understandable).

That's described as "Strip everything but punctuation from a field and store the remainder in the a separate field" but it's not true because the logic used actually only stripes out US-ASCII letters and digits, plus space and tab.

So it doesn't do what it describes for non-ASCII inputs such as letters with accents.

I"m not familiar with Ruby but the fix might be to use the [^[unct:]] regex, assuming that behaves properly for unicode strings and uses the unicode punctuation property.

You'd need to give this a new method name, perhaps "PUNCT".

Would also be nice to give an example output in the docs for each of the methods.

Activity

Show:
Tim Bunce
June 26, 2014, 8:12 PM

You could also document that

make an effective punct filter.

Tim Bunce
October 21, 2014, 3:07 PM

FYI, I'm currently using

Suyog Rao
February 6, 2015, 10:50 PM

Assignee

Logstash Developers

Reporter

Tim Bunce

Labels

None

Affects versions

Configure