Jira server process is terminated in Linux by Out Of Memory Killer
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
The Jira server process is being terminated unexpectedly in a Linux environment by the Out Of Memory Killer (OOM). There is a lack of a clean shutdown in Jira's logs.
Affected environments
- Any version of Jira Software Data Center or Server
- Any version of Jira Service Management Data Center or Server
Diagnose OOM killer with Jira logs
In atlassian-jira.logs, we can see that there is no clean shutdown process:
2021-06-22 13:53:14,025+0000 plugin-transaction-0 INFO [c.a.jira.plugin.PluginTransactionListener] [plugin-transaction] numberStartEvents:1011, numberEndEvents:1011, numberSendEvents:545, numberEventsInTransactions:18710, numberOfPluginEnableEvents:313
#### No shutdown process was started as noted by the lack of localhost-startStop-2 logging.
2021-06-22 14:08:38,286+0000 localhost-startStop-1 INFO [c.a.jira.startup.JiraHomeStartupCheck] The jira.home directory '/var/atlassian/application-data/jira' is validated and locked for exclusive use by this instance.
2021-06-22 14:08:38,337+0000 JIRA-Bootstrap INFO [c.a.jira.startup.JiraStartupLogger]
****************
JIRA starting...
****************
2021-06-22 14:08:38,521+0000 JIRA-Bootstrap INFO [c.a.jira.startup.JiraStartupLogger]
You can search for the OOM killer process with the following command to confirm OOM is the culprit.
# dmesg -T | egrep -i -B 1 'killed process'
Example output:[Tue Jun 22 13:42:49 2021] Out of memory: Kill process 90619 (java) score 440 or sacrifice child
[Tue Jun 22 13:42:49 2021] Killed process 95510 (java), UID 752, total-vm:12301500kB, anon-rss:1873032kB, file-rss:0kB, shmem-rss:0kB
The following appears in /var/log/messages
, /var/log/syslog
, or the systemd kernel journal:
Aug 12 19:12:19 ussclpdapjra002 kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Aug 12 19:12:19 ussclpdapjra002 kernel:
Aug 12 19:12:19 ussclpdapjra002 kernel: Call Trace:
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff800c82e8>] out_of_memory+0x8e/0x2f3
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff8000f506>] __alloc_pages+0x27f/0x308
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff80017949>] cache_grow+0x133/0x3c1
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff8005c6f9>] cache_alloc_refill+0x136/0x186
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff800dc9e3>] kmem_cache_zalloc+0x6f/0x94
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff800bf56f>] taskstats_exit_alloc+0x32/0x89
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff80015693>] do_exit+0x186/0x911
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff800496a1>] cpuset_exit+0x0/0x88
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff8002b29e>] get_signal_to_deliver+0x465/0x494
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff8005b295>] do_notify_resume+0x9c/0x7af
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff8008e16d>] default_wake_function+0x0/0xe
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff800a52a6>] sys_futex+0x10b/0x12b
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff8005e19f>] sysret_signal+0x1c/0x27
Aug 12 19:12:19 ussclpdapjra002 kernel: [<ffffffff8005e427>] ptregscall_common+0x67/0xac
Additionally the /var/log/messages
, /var/log/syslog
, or the systemd kernel journal may include the following log
Aug 12 19:11:52 ussclpdapjra002 kernel: INFO: task java:5491 blocked for more than 120 seconds.
Aug 12 19:11:52 ussclpdapjra002 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 12 19:11:52 ussclpdapjra002 kernel: java D 0000000000000014 0 5491 1 5492 5490 (NOTLB)
Aug 12 19:11:52 ussclpdapjra002 kernel: ffff810722859e18 0000000000000082 0000000000000000 0000000000000001
Aug 12 19:11:52 ussclpdapjra002 kernel: ffff810722859e88 000000000000000a ffff81083673a100 ffff8107024d7080
Aug 12 19:11:52 ussclpdapjra002 kernel: 0000d1276b8c3dc6 0000000000003296 ffff81083673a2e8 0000000400000000
Aug 12 19:11:52 ussclpdapjra002 kernel: Call Trace:
Aug 12 19:11:52 ussclpdapjra002 kernel: [<ffffffff80016dd4>] generic_file_aio_read+0x34/0x39
Aug 12 19:11:52 ussclpdapjra002 kernel: [<ffffffff800656ac>] __down_read+0x7a/0x92
Aug 12 19:11:52 ussclpdapjra002 kernel: [<ffffffff80067ad0>] do_page_fault+0x446/0x874
Aug 12 19:11:52 ussclpdapjra002 kernel: [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
Aug 12 19:11:52 ussclpdapjra002 kernel: [<ffffffff8000c62d>] _atomic_dec_and_lock+0x39/0x57
Aug 12 19:12:08 ussclpdapjra002 kernel: [<ffffffff8000d3fa>] dput+0x3d/0x114
Aug 12 19:12:10 ussclpdapjra002 kernel: [<ffffffff8005ede9>] error_exit+0x0/0x84
Aug 12 19:12:11 ussclpdapjra002 kernel:
Most common causes of OOM events for Jira environments
When the system runs out of memory Linux kernel will automatically start killing processes in order to free up the most memory and that at the same time has the least possible impact on the system. It uses several criteria to determine this, such as the amount of memory the process is using, the process' priority and a calculated impact score. In our example case it was the Jira JVM that fulfilled those criteria and was killed.
This error is usually due to one or more of the below-listed issues.
- The memory configured to be used by JIRA's JVM as configured with the -XMX parameter is not available in the machine
- There is not enough physical memory allocated to the Jira node to run Jira and other processes
- JIRA JVM is configured higher value of -XMX which is not required for the size of the instance
- One or more processes other than JIRA are consuming an unexpectedly high amount of memory
Resolve Jira process shutdowns caused by OOM killer
The content on this page relates to platforms which are not supported for JIRA Applications. Consequently, Atlassian cannot guarantee providing any support for it. Please be aware that this material is provided for your information only and using it is done so at your own risk.
Out of memory shutdowns require careful analysis to resolve, and there isn't a one-size-fits-all solution. We suggest you look at the memory usage patterns and decide how much memory a Jira instance needs and adjust the machine capabilities. The below documents will help in making the right decision about the requirements.
- Analyze OutofMemory errors in Jira Server and Data Center with heap dumps
- Supported Platforms
- Increasing JIRA Memory
- Scaling Jira
- Garbage Collection (GC) Tuning Guide
- GC overhead limit exceeded error crashes Jira server
- Common causes for Jira Server crashes and performance issues
Please note, allocating more memory to Jira does not necessarily mean that Jira performance will increase. If JVM processes manage amounts of memory in excess of Jira's requirements, this will actually result in slower performance of Jira.
Analyze other processes running at the time of OOM termination
The OOM Killer might be triggered because of processes other than Jira. You can analyze the dmesg logs to get more details about all the processes running when Jira was killed. In these logs, locate the "Out of memory: Kill process " message.
Just above that message, the kernel dumps the stats of the processes that were running. For example:
[...]
[XXXXX] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[XXXXX] [ 480] 0 480 13863 113 26 0 -1000 auditd
[XXXXX] [12345] 123 12345 4704977 3306330 6732 0 0 java
[XXXXX] [11939] 0 11939 46699 328 48 0 0 crond
[XXXXX] [11942] 0 11942 28282 45 12 0 0 sh
[XXXXX] [16789] 456 16789 1695936 38643 165 0 0 java
[...]
[XXXXX] Out of memory: Kill process 12345 (java) score 869 or sacrifice child
[XXXXX] Killed process 12345 (java) total-vm:18819908kB, anon-rss:13225320kB, file-rss:0kB, shmem-rss:0kB
[...]
In this example, the Jira PID was 12345
and it was killed. We can see in the summary (Killed process
line) that Jira was using ~13 GiB of memory (see anon-rss
- the total-vm
value can be disregarded). However, in the table there is also another process with PID 16789
that is reserving ~6.4 GiB of memory (note that table memory values are in 4 KiB pages, so you must multiply the rss
value by 4 to determine actual RAM usage in KiB). You can then investigate more about this other process and see what it does by running the following command:
$ ps -f <pid>
It is possible that this process is leaking memory or perhaps just should not be running on the same system as Jira.
For Kubernetes
The Exit code 137 is important because it means that the system terminated the container as it tried to use more memory than its limit
Run this command to find exit code 137 :
kubectl get pod <pod-name> -o yaml
We should see exit code 137 in the output for example :
state:
terminated:
containerID: docker://054cd898b7fff24f75f467895d4b0680c83fc54f49679faeaae975a579af87b8
exitCode: 137
External resource : https://sysdig.com/blog/troubleshoot-kubernetes-oom/