Messages Lost When Logstash Crashes
Messages are lost when using various input sources and the output source fails to write. This appears to be due to the pipeline.rb using a SizedQueue to buffer messages received from inputs. The issue is that if the output plugin fails to write, the messages are not persistent and can be lost if logstash is restarted/crashes/etc..
Here's sample test scenario w/ Redis and file output. This just creates a read-only dir to simulate the output failure. It writes 100 messages to the "test" list in redis. Logstash is started and then crashes. After the crash, the list is missing some messages.
1) redis-server # in one terminal
2) mkdir -p /tmp/nowrite
3) chmod a-w /tmp/nowrite
4) redis-cli -r 100 lpush test "foo"
5) redis-cli lrange test 0 -1 #### Should see 100 entries printed
6) bin/logstash agent -f lost.conf #### Logstash will crash w/ a Permission denied - /tmp/nowrite/log (Errno::EACCES)
7) redis-cli lrange test 0 -1 ##### <100 messages will not be shown. Some are gone.
Repeatedly restarting logstash will continue to drain the list and lose messages. (A process monitoring framework like supervisor could make the amount of data lost much larger if it continually restarted Logstash).
Looking at some of the other input code, I believe this affects all of them... even rabbitmq w/ ACKs enabled since it ACKs after adding the message to the output queue and not after successfully being sent by the output plugin.
I've attached my sample logstash config as well.
This also appears to happens when Logstash does not crash. If I run the same script but use:
url => "http://127.0.0.1:9999/nothere"
http_method => "post"
and then shutdown Logstash gracefully. All the messages are lost.
Updated the ticket to reflect actual bug reported
1) Logstash shouldn't be crashing here, that's one bug.
2) We'll be replacing the internal messaging system eventually with something that is auditable (for metrics/monitoring) and more durable against crashes.