Fisheye is losing connection back to the database intermittently
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
There is an issue with Fisheye where it becomes unresponsive due to database losing connection. In the logs:
2022-06-01 00:32:01,436 WARN [qtp28094269-26249 ] org.hibernate.engine.jdbc.spi.SqlExceptionHelper SqlExceptionHelper-logExceptions - SQL Error: 0, SQLState: 08006
2022-06-01 00:32:01,437 ERROR [qtp28094269-26249 ] org.hibernate.engine.jdbc.spi.SqlExceptionHelper SqlExceptionHelper-logExceptions - An I/O error occurred while sending to the backend.
2022-06-01 00:32:01,437 ERROR [qtp28094269-26249 ] fisheye TotalityFilter-logExceptionDetails - Exception "Could not open connection" (org.hibernate.exception.JDBCConnectionException) while processing "/fisheye/rest/remote-link-aggregation/latest/aggregation" (Referer:"null")
org.hibernate.exception.JDBCConnectionException: Could not open connection
...
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:339) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:310) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:296) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:273) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgConnection.execSQLQuery(PgConnection.java:464) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgConnection.execSQLQuery(PgConnection.java:457) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.jdbc.PgConnection.getTransactionIsolation(PgConnection.java:895) [postgresql-42.2.14.jar:42.2.14]
at com.jolbox.bonecp.ConnectionHandle.getTransactionIsolation(ConnectionHandle.java:633) [bonecp-0.7.1.RELEASE.jar:0.7.1.RELEASE]
at com.cenqua.crucible.hibernate.trackingstatements.HibernateBonceCPConnectionProvider.getConnection(HibernateBonceCPConnectionProvider.java:176) [fisheye.jar:?]
at com.cenqua.crucible.hibernate.trackingstatements.TrackingStatementsConnectionProvider.getConnection(TrackingStatementsConnectionProvider.java:26) [fisheye.jar:?]
at org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection(AbstractSessionImpl.java:301) [hibernate-core-4.2.13.Final.jar:4.2.13.Final]
at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:214) [hibernate-core-4.2.13.Final.jar:4.2.13.Final]
... 170 more
Caused by: javax.net.ssl.SSLException: Connection timed out (Read failed)
at sun.security.ssl.Alert.createSSLException(Alert.java:127) [?:1.8.0_312]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:324) [?:1.8.0_312]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:267) [?:1.8.0_312]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:262) [?:1.8.0_312]
at sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1563) [?:1.8.0_312]
at sun.security.ssl.SSLSocketImpl.access$400(SSLSocketImpl.java:73) [?:1.8.0_312]
at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:973) [?:1.8.0_312]
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.core.PGStream.receiveChar(PGStream.java:370) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2043) [postgresql-42.2.14.jar:42.2.14]
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:312) [postgresql-42.2.14.jar:42.2.14]
... 183 more
Suppressed: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method) [?:1.8.0_312]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) [?:1.8.0_312]
at java.net.SocketOutputStream.write(SocketOutputStream.java:155) [?:1.8.0_312]
at sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:81) [?:1.8.0_312]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:355) [?:1.8.0_312]
... 195 more
Caused by: java.net.SocketException: Connection timed out (Read failed)
at java.net.SocketInputStream.socketRead0(Native Method) [?:1.8.0_312]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) [?:1.8.0_312]
at java.net.SocketInputStream.read(SocketInputStream.java:171) [?:1.8.0_312]
at java.net.SocketInputStream.read(SocketInputStream.java:141) [?:1.8.0_312]
at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464) [?:1.8.0_312]
at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68) [?:1.8.0_312]
at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341) [?:1.8.0_312]
at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73) [?:1.8.0_312]
at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957) [?:1.8.0_312]
... 190 more
Environment
Observed in 4.8.9
Diagnosis
- Thread dumps show threads are blocked on the same address in memory:
* "qtp28094269-26280" #26280 prio=5 os_prio=0 tid=0x00007eff5658c800 nid=0x21fa waiting for monitor entry [0x00007efe759a6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.atlassian.cache.memory.DelegatingCachedReference.getUnderLock(DelegatingCachedReference.java:94)
- waiting to lock <0x00000000887b4ea0> (a com.atlassian.cache.memory.DelegatingCachedReference)
- Thread in a RUNNABLE state will have mentions of a DB connection:
"Navlink Plugin Executor:thread-12" #437 daemon prio=5 os_prio=0 tid=0x00007eff752db000 nid=0x3d21 runnable [0x00007efed7dee000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)
at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113)
at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
at org.postgresql.core.PGStream.receiveChar(PGStream.java:453)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2120)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
- locked <0x0000000098998e78> (a org.postgresql.core.v3.QueryExecutorImpl)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:329)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:315)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:291)
at org.postgresql.jdbc.PgConnection.execSQLQuery(PgConnection.java:476)
at org.postgresql.jdbc.PgConnection.execSQLQuery(PgConnection.java:469)
at org.postgresql.jdbc.PgConnection.getTransactionIsolation(PgConnection.java:913)
at com.jolbox.bonecp.ConnectionHandle.getTransactionIsolation(ConnectionHandle.java:633)
- A util like JProfiler will show DB connection leaks which will corroborate what is seen in the thread dumps, ie. these blocked threads are all waiting for a DB connection.
- PCAP will likely show an issue with respect to Fisheye being able to communicate with the DB eg. the sequence in the back-and-forth communication will not match up as expected (the SYN/ACKS will not match up as they do normally) ← this might be confounded by the fact that the DB remains up an running so there is no clear sign that there is an issue unless you look very closely at multiple data sets
Cause
According to Postgres:
most firewalls, NAT routers, etc will timeout idle sockets after a certain number of minutes... They have to do this otherwise, the firewall's connection tracking tables will fill up with old flug.
Solution
Again according to Postgres you basically have to time the connection out before a network application does.
This can be done in at least 2 ways:
Option 1
- Add the following db connection parameter so that idle connections are timed out after a minute (where
<value>
is a sensible number of minutes shorter that the network imposed timeout but not so short as to incur additional overhead since it is less expensive to leave an idle connection open than it is to re-open new connections )
bonecp.idleMaxAgeInMinutes=<value>
- If present remove bonecp.idleConnectionTestPeriodInMinutes so we do not have any interference from that setting
- If the issue persists grab logs from PG DB directly and correlate with Application logs for further investigation
Option 2
- Set only the following in Fisheye
bonecp.connectionTestStatement="SELECT 1"
bonecp.idleConnectionTestPeriodInMinutes=50
- In postgresql.conf on the DB server set the following
tcp_keepalives_count = 3
tcp_keepalives_idle = 55
tcp_keepalives_interval = 5
These settings are dependent upon what is configured at the network level (so they might be subject to change from case to case).
They should result in Postgres sending TCP keepalives more aggressively and cause a disconnect early if needed, thus preventing the network application from doing the severing.