Proposal: new logstash event schema

Description

The current logstash json schema has a few problems:

  • It uses two namespacing techniques when only one is needed ("@" prefixing, like "@source", and "@fields" object for another namespace)

  • @source_host and @source_path duplicate @source.

  • Not all events have all '@named' fields.

  • Each known '@named' field is not documented well

I always describe events as "timestamp plus data" so let's start there, and make it versioned just because that's smarter.

Most minimal schema will be two fields: timestamp and version. All other values are optional.

Here's my proposal of a minimal schema including only two required fields - version and timestamp.

  • Removes all other '@-named' fields: @source_host, @source, @source_path, @type, @tags, @message.

  • The previous '@fields' namespace is gone, all "event fields" are now top-level.

Further example:

  • The previous schema logstash used shall be known as 'version 0'

  • 'json_event' should accept both. 'version 0' events must be converted to 'version 1' events

Benefits:

  • kibana isn't polluted with "@" symbols everywhere

  • most relevant data is in 'event fields' which is now top-level, no longer "@fields.somefield"

  • fewer "required" event fields.

  • the 'json' input format can go away or generally mean the same thing as json_event

Transition and Backwards Compatibility notes:

  • For previous events with a '@fields.foo = bar', now it will be 'foo = bar'. Elasticsearch lets you search by leaf names, so "foo:bar" will find both events. (victory)

  • Since @message is gone, need to figure out what to do about it.

  • Write an elasticsearch input plugin to allow conversion of old indexes to new schema.

TODO:

  • describe conversion of 'version 0' events to 'version 1' (@fields flattening, @source removal, etc)

Environment

None

Status

Assignee

Jordan Sissel

Reporter

Jordan Sissel

Labels

None

Fix versions

Priority

Configure