We're updating the issue view to help you get more done. 

Duplicate Events at Midnight


Hi everyone,
Ive been struggling and working through setting up logstash - elasticsearch -kibana and working with it for the past month now.
Im at the point of now playing with using logstash and log-forwarder, trying to use filters and send to elasticsearch. After looking through kibana it looks like I have duplicate logs being sent through to ES right at midnight.
I have done some googling and troubleshooting and am still having issues pinpointing the problem.

I am running ElasticSearch 1.1.0, logstash-1.4.0, and kibana 3.0.0..

Here is the situation, we have a central syslog server that stores a bunch of logs. The logs are stored and rotated daily at midnight with the naming convention of logname_YYYY-MM-DD_messages.log. There is a clean up script that zips up anything older than 7 days, and deletes anything older than 30.
It looks like at midnight every night there is a spike in logs where data is duplicated while searching in kibana. I have verified this by doing a search within kibana and ES by tying together a string with a date time stamp combined with other unique characters within the message portion of the log. For example: BARW-INET-FW01 AND "Apr 30 08:41:19" AND "75416". I hit typically two results. Once at the original time the message was generated, and the other right around midnight.

Attached are my rules set up within logstash-forwarder on the central log server, and the configurations on the elasticsearch server. They are very basic at this point, and Im sure I can have my configurations on the logstash forwarder a bit more simple.

Initially I thought within the logstash-forwarder config file, I could be able to add the actual date variable into the filename (https://logstash.jira.com/browse/LOGSTASH-608):
From "paths": [ "/var/log/syslog/OREM-BARW-INET-FW01*.log" ] – to trying "paths": [ "/var/log/syslog/{YEAR}%{MONTHNUM}%{MONTHDAY}*.log" ] which did not work after restarting the logstash-forwader service. Is there a way to use variable type pattern matching to specify which files search for?

I have a feeling that it has to do with the log-rotate at midnight, where logstash somehow sends data again. I read a post to a similar issue where Jordan (https://groups.google.com/forum/#!topic/logstash-users/4xaN6LlYyWA) responded with setting '--window-size 1' on lumberjack. I could not really find any documentation on where this is done. Can anyone lend a hand in this? Are there any strategies/best practices to fix this issue. Essentially the logs are getting duplicated at midnight taking up twice the amount of space it should..

Im not sure if adding a date variable in the input section of the logstash-forwarder file would help...but I would really like to find a way to resolve this. Is it a matter of using logstash forwarder vs just an agent?

Thank you for the help....





Logstash Developers


Dave Marcinowski

Affects versions