The current logstash json schema has a few problems:
It uses two namespacing techniques when only one is needed ("@" prefixing, like "@source", and "@fields" object for another namespace)
@source_host and @source_path duplicate @source.
Not all events have all '@named' fields.
Each known '@named' field is not documented well
I always describe events as "timestamp plus data" so let's start there, and make it versioned just because that's smarter.
Most minimal schema will be two fields: timestamp and version. All other values are optional.
Here's my proposal of a minimal schema including only two required fields - version and timestamp.
Removes all other '@-named' fields: @source_host, @source, @source_path, @type, @tags, @message.
The previous '@fields' namespace is gone, all "event fields" are now top-level.
The previous schema logstash used shall be known as 'version 0'
'json_event' should accept both. 'version 0' events must be converted to 'version 1' events
kibana isn't polluted with "@" symbols everywhere
most relevant data is in 'event fields' which is now top-level, no longer "@fields.somefield"
fewer "required" event fields.
the 'json' input format can go away or generally mean the same thing as json_event
Transition and Backwards Compatibility notes:
For previous events with a '@fields.foo = bar', now it will be 'foo = bar'. Elasticsearch lets you search by leaf names, so "foo:bar" will find both events. (victory)
Since @message is gone, need to figure out what to do about it.
Write an elasticsearch input plugin to allow conversion of old indexes to new schema.
describe conversion of 'version 0' events to 'version 1' (@fields flattening, @source removal, etc)