Elasticsearch output disconnects and will not reconnect
Description
When using the Elasticsearch output the Logstash process will be disconnected from the cluster due to ping timeouts and never reconnect. Logstash does not log anything when this happens but the Elasticsearch master node does. I have attached the Elasticsearch nodes and have pasted them below.
I am not sure if this is a bug in the Logstash Elasticsearch output or in the Elasticsearch client lib.
To work around it I have had to start running an external watchdog. Ideally we could avoid ping timeouts completely, but Logstash should be resilient to these sorts of failures.
[2014-02-28 11:50:33,901][WARN ][transport ] [$ES_MASTER] Received response for a request that has timed out, sent [69233ms] ago, timed out [39233ms] ago, action [discovery/zen/fd/ping], node [[$LOGSTASH_NODE][$CLIENT_ID][inet/$IP_ADDRESS:9300]{client=true, data=false}], id [9982233] [2014-02-28 11:50:54,670][INFO ][cluster.service ] [$ES_MASTER] removed {[$LOGSTASH_NODE][$CLIENT_ID][inet/$IP_ADDRESS:9300]{client=true, data=false},}, reason: zen-disco-node_failed([$LOGSTASH_NODE][$CLIENT_ID][inet/$IP_ADDRESS:9300]{client=true, data=false}), reason failed to ping, tried [3] times, each with maximum [30s] timeout [2014-02-28 11:50:54,699][DEBUG][action.admin.cluster.node.stats] [$ES_MASTER] failed to execute on node [$CLIENT_ID] org.elasticsearch.transport.NodeDisconnectedException: [$LOGSTASH_NODE][inet/$IP_ADDRESS:9300][cluster/nodes/stats/n] disconnected [2014-02-28 11:50:54,699][DEBUG][action.admin.cluster.node.stats] [$ES_MASTER] failed to execute on node [$CLIENT_ID] org.elasticsearch.transport.NodeDisconnectedException: [$LOGSTASH_NODE][inet/$IP_ADDRESS:9300][cluster/nodes/stats/n] disconnected [2014-02-28 11:50:54,699][DEBUG][action.admin.cluster.node.stats] [$ES_MASTER] failed to execute on node [$CLIENT_ID] org.elasticsearch.transport.NodeDisconnectedException: [$LOGSTASH_NODE][inet/$IP_ADDRESS:9300][cluster/nodes/stats/n] disconnected [2014-02-28 11:50:54,699][DEBUG][action.admin.cluster.node.stats] [$ES_MASTER] failed to execute on node [$CLIENT_ID] org.elasticsearch.transport.NodeDisconnectedException: [$LOGSTASH_NODE][inet/$IP_ADDRESS:9300][cluster/nodes/stats/n] disconnected
When using the Elasticsearch output the Logstash process will be disconnected from the cluster due to ping timeouts and never reconnect. Logstash does not log anything when this happens but the Elasticsearch master node does. I have attached the Elasticsearch nodes and have pasted them below.
I am not sure if this is a bug in the Logstash Elasticsearch output or in the Elasticsearch client lib.
To work around it I have had to start running an external watchdog. Ideally we could avoid ping timeouts completely, but Logstash should be resilient to these sorts of failures.
I am running Logstash 1.3.3 against Elasticsearch 0.90.9. You can find the logstash config with output at https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/templates/logstash/indexer.conf.erb#n111.