Uploaded image for project: 'logstash'
  1. LOGSTASH-2251

fingerprint punctuation doesn't just leave punctuation

    Details

    • Type: Bug/Feature
    • Status: Resolved (View workflow)
    • Resolution: Duplicate
    • Affects versions: 1.4.0
    • Fix versions: None
    • Labels:
      None

      Description

      The PUNCTUATION method of the fingerprint filter isn't described on the fingerprint docs, but the behaviour matches the old punct filter (which is understandable).

      That's described as "Strip everything but punctuation from a field and store the remainder in the a separate field" but it's not true because the logic used actually only stripes out US-ASCII letters and digits, plus space and tab.

      So it doesn't do what it describes for non-ASCII inputs such as letters with accents.

      I"m not familiar with Ruby but the fix might be to use the [^[:punct:]] regex, assuming that behaves properly for unicode strings and uses the unicode punctuation property.

      You'd need to give this a new method name, perhaps "PUNCT".

      Would also be nice to give an example output in the docs for each of the methods.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                logstash-dev Logstash Developers (Inactive)
                Reporter:
                tim.bunce Tim Bunce
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: