Cannot discover nodes, returning empty list AWS Hazelcast Discovery
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Confluence Data Center fails to discover nodes with a Cannot discover nodes, returning empty list warning followed by a connect timed out stack trace:
2020-03-19 16:11:23,627 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [1] retrying in 1 seconds...
2020-03-19 16:11:35,134 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [2] retrying in 2 seconds...
2020-03-19 16:11:47,396 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [3] retrying in 3 seconds...
2020-03-19 16:12:00,784 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [4] retrying in 5 seconds...
2020-03-19 16:12:15,857 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [5] retrying in 7 seconds...
2020-03-19 16:12:33,458 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [6] retrying in 11 seconds...
2020-03-19 16:12:54,857 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [7] retrying in 17 seconds...
2020-03-19 16:13:21,953 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [8] retrying in 25 seconds...
2020-03-19 16:13:57,589 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [9] retrying in 38 seconds...
2020-03-19 16:14:46,030 WARN [Catalina-utility-1] [hazelcast.aws.utility.RetryUtils] log Couldn't connect to the AWS service, [10] retrying in 57 seconds...
2020-03-19 16:15:53,699 WARN [Catalina-utility-1] [com.hazelcast.aws.AwsDiscoveryStrategy] log Cannot discover nodes, returning empty list
com.hazelcast.core.HazelcastException: java.net.SocketTimeoutException: connect timed out
at com.hazelcast.util.ExceptionUtil$1.create(ExceptionUtil.java:40)
at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:124)
at com.hazelcast.util.ExceptionUtil.peel(ExceptionUtil.java:69)
at com.hazelcast.util.ExceptionUtil.rethrow(ExceptionUtil.java:129)
at com.hazelcast.aws.utility.RetryUtils.retry(RetryUtils.java:56)
at com.hazelcast.aws.impl.DescribeInstances.callServiceWithRetries(DescribeInstances.java:272)
at com.hazelcast.aws.impl.DescribeInstances.execute(DescribeInstances.java:262)
at com.hazelcast.aws.AWSClient.getAddresses(AWSClient.java:57)
at com.hazelcast.aws.AwsDiscoveryStrategy.discoverNodes(AwsDiscoveryStrategy.java:146)
at com.hazelcast.spi.discovery.impl.DefaultDiscoveryService.discoverNodes(DefaultDiscoveryService.java:71)
at com.hazelcast.internal.cluster.impl.DiscoveryJoiner.getPossibleAddresses(DiscoveryJoiner.java:70)
at com.hazelcast.internal.cluster.impl.DiscoveryJoiner.getPossibleAddressesForInitialJoin(DiscoveryJoiner.java:59)
at com.hazelcast.cluster.impl.TcpIpJoiner.joinViaPossibleMembers(TcpIpJoiner.java:131)
at com.hazelcast.cluster.impl.TcpIpJoiner.doJoin(TcpIpJoiner.java:90)
at com.hazelcast.internal.cluster.impl.AbstractJoiner.join(AbstractJoiner.java:135)
at com.hazelcast.instance.Node.join(Node.java:767)
at com.hazelcast.instance.Node.start(Node.java:411)
at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:131)
at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:202)
at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:181)
at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:131)
at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:57)
at com.atlassian.confluence.cluster.hazelcast.HazelcastClusterManager.startCluster(HazelcastClusterManager.java:344)
at com.atlassian.confluence.cluster.hazelcast.HazelcastClusterManager.reconfigure(HazelcastClusterManager.java:316)
at com.atlassian.confluence.cluster.DefaultClusterConfigurationHelper.bootstrapCluster(DefaultClusterConfigurationHelper.java:407)
at com.atlassian.confluence.setup.DefaultBootstrapManager.afterConfigurationLoaded(DefaultBootstrapManager.java:831)
at com.atlassian.config.bootstrap.DefaultAtlassianBootstrapManager.init(DefaultAtlassianBootstrapManager.java:75)
at com.atlassian.confluence.setup.DefaultBootstrapManager.init(DefaultBootstrapManager.java:188)
at com.atlassian.config.util.BootstrapUtils.init(BootstrapUtils.java:36)
at com.atlassian.confluence.setup.ConfluenceConfigurationListener.initialiseBootstrapContext(ConfluenceConfigurationListener.java:133)
at com.atlassian.confluence.setup.ConfluenceConfigurationListener.contextInitialized(ConfluenceConfigurationListener.java:64)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4682)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5143)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1384)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1374)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.security.ssl.SSLSocketImpl.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.protocol.https.HttpsClient.<init>(Unknown Source)
at sun.net.www.protocol.https.HttpsClient.New(Unknown Source)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(Unknown Source)
at com.hazelcast.aws.impl.DescribeInstances.callService(DescribeInstances.java:291)
at com.hazelcast.aws.impl.DescribeInstances$1.call(DescribeInstances.java:276)
at com.hazelcast.aws.impl.DescribeInstances$1.call(DescribeInstances.java:272)
at com.hazelcast.aws.utility.RetryUtils.retry(RetryUtils.java:52)
... 38 more
Environment
- Confluence Data Center
- AWS node discovery
- New node is added to the cluster
Diagnosis
Adding the following to <Confluence-Install>\conf\logging.properties
before restarting Confluence to see what request is timing out:
sun.net.www.protocol.http.HttpURLConnection.level = FINEST
sun.net.www.protocol.http.HttpURLConnection.handlers = java.util.logging.ConsoleHandler
Replace debugging level from FINE to FINEST for the below entry in the same file
java.util.logging.ConsoleHandler.level = FINEST
Cause 1
Causes will vary, but in one case we saw that there was an HTTP NULL response was returned from GET /latest/meta-data/iam/security-credentials/
19-Mar-2020 17:13:10.135 FINE [Catalina-utility-1] sun.net.www.protocol.http.HttpURLConnection.writeRequests sun.net.www.MessageHeader@123456 pairs: {GET /latest/meta-data/iam/security-credentials/test-iam HTTP/1.1: null}{User-Agent: Java/1.8.0_171}{Host: 123.456.789.123}{Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2}{Connection: keep-alive}
Cause 2
If you see messages like this, check if the IAM role in use and its permissions are correct, update <Confluence-local-home>/confluence.cfg.xml accordingly if needed:
com.hazelcast.config.InvalidConfigurationException: Unable to retrieve credentials from IAM Role: <IAM-Role-Name>
at com.hazelcast.aws.impl.DescribeInstances.fillKeysFromIamRole(DescribeInstances.java:134)
...
Caused by: com.hazelcast.config.InvalidConfigurationException: Unable to lookup role in URI: http://169.254.169.254/latest/meta-data/iam/security-credentials/<IAM-Role-Name>
at com.hazelcast.aws.utility.MetadataUtil.retrieveMetadataFromURI(MetadataUtil.java:78)
...
Caused by: java.io.FileNotFoundException: http://169.254.169.254/latest/meta-data/iam/security-credentials/<IAM-Role-Name>
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1893)
Resolution
For Cause 1, adding proxy connection information to your setenv file on all nodes to allow the connection to complete:
CATALINA_OPTS="-Dhttp.nonProxyHosts=localhost\|169.254.170.2\|169.254.169.254\|127.0.0.1 ${CATALINA_OPTS}"
CATALINA_OPTS="-Dhttps.nonProxyHosts=localhost\|169.254.170.2\|169.254.169.254\|127.0.0.1 ${CATALINA_OPTS}"
CATALINA_OPTS="-Dhttp.proxyHost=<the proxy url> -Dhttp.proxyPort=<the proxy port> ${CATALINA_OPTS}"
CATALINA_OPTS="-Dhttps.proxyHost=<the proxy url> -Dhttps.proxyPort=<the proxy port> ${CATALINA_OPTS}"
Restart Confluence after this has been applied in order to resolve the issue.
For Cause 2, check if the IAM role name is correct in <Confluence-local-home>/confluence.cfg.xml.