fingerprint punctuation doesn't just leave punctuation

Description

The PUNCTUATION method of the fingerprint filter isn't described on the fingerprint docs, but the behaviour matches the old punct filter (which is understandable).

That's described as "Strip everything but punctuation from a field and store the remainder in the a separate field" but it's not true because the logic used actually only stripes out US-ASCII letters and digits, plus space and tab.

So it doesn't do what it describes for non-ASCII inputs such as letters with accents.

I"m not familiar with Ruby but the fix might be to use the [^[unct:]] regex, assuming that behaves properly for unicode strings and uses the unicode punctuation property.

You'd need to give this a new method name, perhaps "PUNCT".

Would also be nice to give an example output in the docs for each of the methods.

Environment

None

Status

Assignee

Logstash Developers

Reporter

Tim Bunce

Labels

None

Affects versions

Priority

Configure