Logstash doesn't change year var after 20131231.
Description
Attachments
- 02 Jan 2014, 01:56 PM
Gliffy Diagrams
Activity
Aaron Mildenstein February 6, 2015 at 11:54 PM
Migrating this to https://github.com/logstash-plugins/logstash-filter-date/issues/3
Former user January 9, 2014 at 12:07 PM
I've had the same problem with Logstash 1.2.2 processing MongoDB logs, which don't have the year included in the timestamp. For example:
Thu Jan 9 06:27:53.624 [conn493116] command SCHEDULE.$cmd command: { getlasterror: 1, j: true } ntoreturn:1 keyUpdates:0 reslen:83 106ms
Starting Jan 1st, these logs started writing to Elasticsearch with 2013 added to the date rather than the current year. As with the other examples above, a restart of Logstash has fixed the problem for now as Logstash has a different view of its current year.
Jon Tai January 6, 2014 at 8:27 AM
Agreed that handling clock skew correctly could be tricky. But I think as a first step, logstash needs to have a dynamic concept of "the current year" – without that, "the current year" will always mean the year in which the date filter was initialized. I put together a patch for this first step here: https://github.com/jtai/logstash/compare/logstash-1744
With this change, the date filter will ensure that the joda time parser is correctly configured with the current year before parsing the date. This check happens each time a date is parsed. This improves the current behavior somewhat by making the incorrect guessing only last a few seconds/minutes (the exact duration depends on clock skew, network latency, queuing, etc.)
If the approach is agreeable I can make the branch a pull request.
Regarding the handling of clock skew – perhaps we could look specifically for the December to January boundary? So instead of just detecting "future" dates, only apply correction if the current month is December and the log event's month is January (or vice versa, to account for clock skew in the other direction). This doesn't help in the case where "clock skew" is greater than a month though, e.g., when processing historical logs months after the fact.
Alternatively, we could parse the date with the previous, current, and next year, and take the resulting timestamp that is closest to the current timestamp, but that has obvious performance implications.
Jordan Sissel January 3, 2014 at 6:24 PM
Functionally, your times are missing important data so logstash has to guess, and in this case it guesses "the current year" and is wrong.
At a glance, I'm not sure how to best solve this. Open to solutions.
My first thought was 'future dates mean last year', but if If you take 'future dates' to mean "last year" and your logstash agent's clock is behind your log timestamps, even by milliseconds, then you'll get 'current' logs recorded as 'last year' which is worse behavior than we have today.
Émile Plourde-Lavoie January 3, 2014 at 3:03 PM
I have the same issue and I can confirm the bug is present in 1.3.2 .
Jon Tai is correct in his conclusion : the most likely cause of this bug is the statement in filters/date.rb at line 172.
Restarting logstash will probably solve the problem for a full year.
Hi,
I'm running logstash 1.2.1 to fill elasticsearch indexes rotating daily the name.
The problem is that the index that should be named 2014.01.01 is 2013.01.01 and so on. There are no problem with months and days.
After restarting logstash, the index was created with the correct name.
Here is the config:
input { file { path => ["/var/log/dns/*.log"] sincedb_path => "/usr/local/logstash/sincedb_logs_dns" start_position => "beginning" type => "dns_log" } } [...] output { elasticsearch { node_name => "Syslog DNS" index => "dns-%{+YYYY.MM.dd}" cluster => "syslog" bind_host => "elastic.server.com" } }