Logstash may emit duplicates when restarted

Description

Logstash bundes a custom version of the filewatch gem, which is buggy. Its reads data in 16kB chunks, and tries to sync the sincedb after each chunk. However, this is flawed, because 1) mid-line file offsets are being saved 2) when all chunks including the last one are processed in less than 10s, the last position (eof) is not saved. When logstash is killed and restarted, duplicate entries are emitted.

sincedb should not be updated in the read loop, but only at the end. https://github.com/jordansissel/ruby-filewatch has the correct code.

Environment

None

Status

Assignee

Logstash Developers

Reporter

Zdenek Pavlas

Affects versions

Priority

Configure