Bamboo Data Center job on Ephemeral agent is stuck with a pod in an error state in an OpenShift Kubernetes Cluster
The steps outlined on this article are provided AS-IS. This means we've had reports of them working for some customers — under certain circumstances — yet are not officially supported, nor can we guarantee they'll work for your specific scenario.
You may follow through and validate them on your own non-prod environments prior to production or fall back to supported alternatives if they don't work out.
We also invite you to reach out to our Community for matters that fall beyond Atlassian's scope of support!
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Bamboo job fails to run on an ephemeral agent with a pod stuck in an error state in an OpenShift Kubernetes Cluster. Below error message is seen on the ephemeral pod logs.
PermissionError: [Errno 13] Permission denied: '/var/atlassian/application-data/bamboo-agent/docker-app.pid'
Environment
- The solution has been tested on Bamboo Data Center versions 9.3.2 and 9.4.1
- OpenShift Kubernetes cluster 4.12
Diagnosis
Below message is seen on the Pods section on the Ephemeral Agents configuration on Bamboo GUI.
An investigation in the Kubernetes cluster shows the pod in an error state
oc get pods -n bamboo-ephemeral
NAME READY STATUS RESTARTS AGE
eph-ephe-job1-18-kbpkdrap 0/1 Error 0 21m
The below message was seen in the pod logs
+ KUBE_NUM_EXTRA_CONTAINERS_OR_ZERO=0
+ '[' -d '' ']'
+ [[ -d /pbc/kube ]]
+ exec /usr/bin/tini -- /entrypoint.py
INFO:root:Generating /var/atlassian/application-data/bamboo-agent/conf/wrapper.conf from template wrapper.conf.j2
WARNING:root:Permission problem writing '/var/atlassian/application-data/bamboo-agent/conf/wrapper.conf'; skipping
Traceback (most recent call last):
File "/entrypoint.py", line 57, in <module>
exec_app(['/opt/java/openjdk/bin/java', JAVA_OPTS, '-jar', f'{BAMBOO_AGENT_INSTALL_DIR}/atlassian-bamboo-agent-installer.jar'] + AGENT_OPTS, BAMBOO_AGENT_HOME, name='Bamboo Agent', env_cleanup=True)
File "/entrypoint_helpers.py", line 138, in exec_app
write_pidfile()
File "/entrypoint_helpers.py", line 120, in write_pidfile
with open(pidfile, 'wt', encoding='utf-8') as fd:
PermissionError: [Errno 13] Permission denied: '/var/atlassian/application-data/bamboo-agent/docker-app.pid'
Cause
Openshift Kubernetes platform ships 8 redefined security context constraints to limit access on a running pod to the host environment. Security Context constraints (SCC) provides a mechanism for Openshift administrators to control permission of pods. Those permissions include, using host filesystem, using privileged port numbers, changing user ID and others.
oc get pod <podname> -o yaml | oc adm policy scc-subject-review -f -
RESOURCE ALLOWED BY
Pod/eph-ephe-job1-23-ygjwppdo restricted-v2
The pod in the error state can be reviewed using the command above to list the SCC that is preventing the application from running. The restricted-v2 SCC runs the container using a random user ID.
In this particular issue, the application tries to write to the /var/atlassian/application-data/bamboo-agent directory, this required a root user's permission. Running an application with the ANYUID SCC will help overcome this hinderance. The service account used here does not have the required SCC privileges.
Solution
The fix for this issue has been released. Please download the new image released after Feb 1st 2024 from https://hub.docker.com/r/atlassian/bamboo-agent-base.
Assign the required ANYUID SCC privilege to the service account.
oc adm policy add-scc-to-user anyuid -z bamboo-ephemeral
Now you can attempt restarting the build.