The current logstash json schema has a few problems:
- It uses two namespacing techniques when only one is needed ("@" prefixing, like "@source", and "@fields" object for another namespace)
- @source_host and @source_path duplicate @source.
- Not all events have all '@named' fields.
- Each known '@named' field is not documented well
I always describe events as "timestamp plus data" so let's start there, and make it versioned just because that's smarter.
Most minimal schema will be two fields: timestamp and version. All other values are optional.
Here's my proposal of a minimal schema including only two required fields - version and timestamp.
- Removes all other '@-named' fields: @source_host, @source, @source_path, @type, @tags, @message.
- The previous '@fields' namespace is gone, all "event fields" are now top-level.
- The previous schema logstash used shall be known as 'version 0'
- 'json_event' should accept both. 'version 0' events must be converted to 'version 1' events
- kibana isn't polluted with "@" symbols everywhere
- most relevant data is in 'event fields' which is now top-level, no longer "@fields.somefield"
- fewer "required" event fields.
- the 'json' input format can go away or generally mean the same thing as json_event
Transition and Backwards Compatibility notes:
- For previous events with a '@fields.foo = bar', now it will be 'foo = bar'. Elasticsearch lets you search by leaf names, so "foo:bar" will find both events. (victory)
- Since @message is gone, need to figure out what to do about it.
- Write an elasticsearch input plugin to allow conversion of old indexes to new schema.
- describe conversion of 'version 0' events to 'version 1' (@fields flattening, @source removal, etc)