

Jul 25 16:32:47 elasticsearch6-1 prod_elasticsearch6-1.1.bzj8mpxn3oipmw4hasa4qzd8u info at .waitForNextChange(ClusterStateObserver.java:145) ~ Jul 25 16:32:47 elasticsearch6-1 prod_elasticsearch6-1.1.bzj8mpxn3oipmw4hasa4qzd8u info at $ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) ~ Jul 25 16:32:47 elasticsearch6-1 prod_elasticsearch6-1.1.bzj8mpxn3oipmw4hasa4qzd8u info at .master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:223) ~ Jul 25 16:32:47 elasticsearch6-1 prod_elasticsearch6-1.1.bzj8mpxn3oipmw4hasa4qzd8u info Caused by: : FailedToCommitClusterStateException left] Jul 25 16:32:47 elasticsearch6-1 prod_elasticsearch6-1.1.bzj8mpxn3oipmw4hasa4qzd8u info : Jul 25 16:32:45 elasticsearch6-3 prod_elasticsearch6-3.1.g4is7belylt9ji46fwpv3cihm INFO master_left [ Jul 25 15:33:17 elasticsearch6-3 prod_elasticsearch6-3.1.g4is7belylt9ji46fwpv3cihm WARN overhead, spent collecting in the last Jul 25 14:59:56 elasticsearch6-2 prod_elasticsearch6-2.1.z6cdi883dcrrquzyeczys3nu3 INFO Cluster health status changed from to (reason: ]. You'll notice the log previous to the error was an hour earlier, so, no warning for the drop. After I got the cluster back to a decent state I ran a snapshot again and it completed just fine. I don't know if this caused the drop, but ALL previous drops over the last few weeks have been unprovoked. You'll see that at the time of these logs I initiated a snapshot. I'm hoping someone who understands more about the internals of Elasticsearch will be able to work out what the issue is from the below logs. Today I completed moving everything over to an Elasticsearch 6.3 cluster, hoping that the updates to Elasticsearch would fix whatever is causing the nodes to drop.
Stack masters upgrade#
The only issues I've had since the host OS upgrade have been with Elasticsearch. Many other services run in the same swarm and never have any difficulty with the network. A few times per day two nodes drop at once and the cluster goes to red.
Stack masters update#
Since the OS update there have been frequent occasions where a seemingly random node in the 3-node cluster drops all its shards, sending the cluster to yellow. I didn't change anything with Elasticsearch, still using the same Docker images I've been using for over a year. I stayed with 2.4 up until now and everything has been running smoothly.Ī few weeks ago I updated the host OS of the Docker Swarm from Ubuntu 16.04 to Ubuntu 18.04. Last year I tried to update to 5.x but had heaps of issues getting it to even run in a Docker environment. I have been using Elasticsearch for several years now.
