restart logstash and FILE input behavior : Found bug and possible resolution

Description

Using the File input filter :
Restart of logstash uses since_db to know where it left off in the file.
If I stop the shipper, and add a log record to the file being watched. When logstash is restarted, I get the added log message which is great. But, if I shutdown the shipper and restart, I get the added log message again. I can do this over and over and will get the added log message every restart. This situation will occur until a brand new log message comes into the file being watch. This seems to be because the only time since_db is updated with the file location is when it receives a brand new log message. IMO, the fix : logstash starts where it knows to start off based on the since_db info, but if it sees additional messages, then it is clear they were not processed prior and logstash must have been down. So, for each additional record processed since_db should be updated. Without looking through the code, it seems that the new messages must not be going through the same code path, where after processing a new message, update the cache. In the recovery case, logstash processes the new records with no since_db cache update.

In a different type of situation, if logstash is down and had 10 messages in the file prior to the shutdown, someone now truncates the file and adds 3 new records, logstash appears to see that that the since_db byte count is higher than the file byte count (since we went from 10 messages to 3) and is replaying all 3 of the messages...but yet again, it does not update the since_db cache. So, if I were to restart, it will replay all 3 of the messages again. This can also be repeated over and over again just by doing another restart. So, again, logstash needs to make sure since_db is maintained during recovery.

I hope the description helps.

Activity

Show:
Bruce Danziger
February 17, 2014, 4:13 AM

Further investigation shows this seems to be an issue with Filewatch. Testing filewatch standalone though, I cannot reproduce the problem. So, it might have been fixed. It looks like logstash / filewatch code is correct, but I still see the problem happening in Logstash. I don't see any recent fixes for this in Filewatch so I am not sure why it works standalone but not in logstash.

Bruce Danziger
February 20, 2014, 3:32 PM

I am no longer certain that this is a bug. After looking at the code, I now see that it is designed to update sincedb only 1 times within @sincedb_write_interval and that is what is occurring.

On restart, old messages are replayed because sincedb is behind. But, across N files, only the first one causes a sincedb write. So, the N-1 files will replay, but not update sincedb. Sincedb will remain that way until a new event comes in after the sincedb_write_interval

I think what should happen though is an enhancement.
Example : if 3 log events happen within sincedb_write_interval, the first event triggers a sincedb write. The next two events cause a CHANGE but don't cause a write. There should be a trigger after sincedb_write_interval amount of time to say, if anything changed, write sincedb.

This will keep sincedb up to date within sincedb_write_interval amount of time.

I see others have suggested writing after EOF, but I am not sure I agree with that because that could cause you to write many many times, even though your interval is higher.

Bruce Danziger
February 20, 2014, 4:27 PM

Here is how I would fix this. But, I am not certain that filewatch is considered part of logstash. I know they are separate, but seem to be very tied.

I don't feel this is a hack.

In filewatch/tail.rb, _read_file : The code checks for a change (something was read). Then it checks the sincedb_write_interval. If we are less than the sincedb_write_interval, it should set a class flag saying sincedb_write_pending = true. Because, technically, it should get updated to be in sync, but it isn't because it is less than the write interval.

I would change watch.rb in the each method. So, that if there was no change to the file, it should yield(path, :something_new_indicating_no_change). Because then, we could do something in tail.rb if we want.

Back in tail.rb in subscribe, I would now add a section for the :something_new_indicating_no_change and under that section, it should check the sincedb_write_pending flag. If it is true, it should call _sincedb_write.

And in _sincedb_write, you can then reset the sincedb_pending_write flag to false since you are clearly no longer pending. Additionally, I would move @sincedb_last_write = now in _read_file into _sincedb_write. Because there is a sincedb_write function that logstash calls..which calls _sincedb_write, but even though you write the sincedb file, the @sincedb_last_write doesn't get updated. I feel it really belongs WITH the function that does the write. Not with the calling function.

Assignee

Logstash Developers

Reporter

Bruce Danziger

Labels

Affects versions

Configure