One output blocking will block all outputs.

Description

See config file below. Running on 1.4.1

Reference thread: https://groups.google.com/d/msg/logstash-users/bLX9IE0ak-o/h_kiHuFCgNUJ

a) I have 2 local redis instances listening on different ports, redisA and redisB

b) InputA "file" reads from a dumb log file (i.e. each line just has a new number on it)

c) Events from InputA end up in output redisA (in a list)

d) InputB reads events from the list from RedisA

e) Events from inputB go to another list on RedisB

This works fine when all conditions are normal and the issue is with the inputB pulling from redisA when redisB (final destination) is down

When I kill ReidsB (the final output), logstash keeps consuming from the file (as expected) for a little while and sends the events to redisA. However the list length in redisA (being read from inputB) eventually ends up being zero while redisB (final output) is down. Even as hundreds of lines are added to the file. According to the docs, the internal sized queues are 20, so at some point I would expect the list length in redisA to start reporting a size > 0 as logstash's queues begin to fill up and block (stop consuming from redisA). However this is not the case.

When redisB is down, as I add thousands of additional lines in the log file and save, the .sincedb does NOT increment (hence why redisA reports zero in size), meaning that inputA is blocked... which is odd to me because inputA leads to the output of redisA which is up.

Again, inputB reads from redisA which is a decoupling point. So I would understand that inputB would be blocked reading from redisA (because its ultimate target of redisB is down), however I thought "inputs" where on entirely separate threads? So why would an output of inputB affect the thread of inputA (reading from the file).

In SUMMARY: I don't follow why the thread that runs inputA (reading from the file, writing to redisA(which is up)), would block when outputB is unavailable. If each input -> output path/queues are indeed entirely separate threads they should not affect one another.

The docs state: "The output worker model is currently a single thread. Outputs will receive events in the order they are defined in the config file.". Jordon on the mailing lists said that outputs are no longer single worker limited (note I tried multiple workers on the outputs, same results)

If they are not entirely separate threads, then this would be a feature request to support that, otherwise I need to run 2 separate logstash agents to achieve this routing I am trying to do.

(also note I can never cleanly shutdown logstash, I always have to kill -9 it, I just see

Sending shutdown signal to input thread {:thread=>#<Thread:0x49bd1798 run>, :level=>:info, :file=>"logstash/pipeline.rb", :line=>"236"}
caller requested sincedb write () {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"185"}
^CInterrupt received. Shutting down the pipeline. {:level=>:warn, :file=>"logstash/agent.rb", :line=>"119"}

CONFIG
----------------

input {

file {
charset => "US-ASCII"
path => "/path/to/test.log"
sincedb_path => "./.sincedb"
start_position => "end"

tags => [ "queue_locally" ]
add_field => [ "log_type", "test" ]

}

redis {
host => ["127.0.0.1"]
port => 6379

  1. auth
    key => "local_logs_queue"
    data_type => "list"
    tags => [ "queue_remotely" ]
    }


}

filter {

if "queue_remotely" in [tags] {
mutate {
remove_tag => ["queue_locally"]
}
}

if [log_type] == "test" and "filtered" not in [tags] {

mutate {
add_tag => ["filtered"]
add_field => {
"some_field" => "yo!"
}
}

} else {
mutate {
remove_tag => ["filtered"]
}
}

}

output {

if "queue_locally" in [tags] {
redis {
host => ["127.0.0.1"]

  1. auth
    port => 6379
    key => "local_logs_queue"
    data_type => "list"
    }
    }


if "queue_remotely" in [tags] {

redis {
host => ["127.0.0.1"]

  1. auth
    port => 6378
    key => "final_logs"
    data_type => "list"
    }
    }

}

Gliffy Diagrams

Activity

Show:

Philippe Weber March 12, 2015 at 5:18 AM

bitsofinfo July 8, 2014 at 10:45 PM

Thanks for the explanation and confirmation of what I was experiencing.

Hopefully this will serve as a workaround in the time being

Jordan Sissel July 8, 2014 at 9:06 PM

I updated the subject of the ticket to more accurately reflect the situation.

Jordan Sissel July 8, 2014 at 9:06 PM

Another way to summarize this problem is you are attempting to create a cycle where logstash feeds itself (an input reads from what an output produces) and with the way the pipeline works today, this will upset you because if any output is down or slow, all output will stall.

Jordan Sissel July 8, 2014 at 9:04 PM
Edited

This is a feature of the current design, not a bug, but there's certainly room for improving things.

There is currently 1 and only 1 pipeline in Logstash. Inputs flow to filters flow to outputs. If one output is blocked, all outputs are blocked.

Your logical flow goes (file -> redisA -> redisB), but your config is inputs (file + redisA) and outputs (redisA and redisB) which doesn't necessarily map to this with the one-pipeline model we have today.

We are planning on allowing multiple pipelines in Logstash, but for now to do this you have to either:

1) Run multiple agents in the same process (bin/logstash agent -f first.conf – agent -f second.conf)
2) Run multiple single-agent processes.

Duplicate

Details

Assignee

Reporter

Affects versions

Created July 8, 2014 at 2:53 PM
Updated March 12, 2015 at 5:18 AM
Resolved March 12, 2015 at 5:18 AM