Bamboo Data Center job on Ephemeral agent is stuck with a pod in an error state in an OpenShift Kubernetes Cluster

Still need help?

The Atlassian Community is here for you.

Ask the community

 

The steps outlined on this article are provided AS-IS. This means we've had reports of them working for some customers — under certain circumstances — yet are not officially supported, nor can we guarantee they'll work for your specific scenario.

You may follow through and validate them on your own non-prod environments prior to production or fall back to supported alternatives if they don't work out.

We also invite you to reach out to our Community for matters that fall beyond Atlassian's scope of support!

 

Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Summary

Bamboo job fails to run on an ephemeral agent with a pod stuck in an error state in an OpenShift Kubernetes Cluster. Below error message is seen on the ephemeral pod logs.

PermissionError: [Errno 13] Permission denied: '/var/atlassian/application-data/bamboo-agent/docker-app.pid'

Environment

  • The solution has been tested on Bamboo Data Center versions 9.3.2 and 9.4.1
  • OpenShift Kubernetes cluster 4.12

Diagnosis

Below message is seen on the Pods section on the Ephemeral Agents configuration on Bamboo GUI.



An investigation in the Kubernetes cluster shows the pod in an error state

oc get pods -n bamboo-ephemeral
NAME                        READY   STATUS   RESTARTS   AGE
eph-ephe-job1-18-kbpkdrap   0/1     Error    0          21m

The below message was seen in the pod logs

+ KUBE_NUM_EXTRA_CONTAINERS_OR_ZERO=0
+ '[' -d '' ']'
+ [[ -d /pbc/kube ]]
+ exec /usr/bin/tini -- /entrypoint.py
INFO:root:Generating /var/atlassian/application-data/bamboo-agent/conf/wrapper.conf from template wrapper.conf.j2
WARNING:root:Permission problem writing '/var/atlassian/application-data/bamboo-agent/conf/wrapper.conf'; skipping
Traceback (most recent call last):
  File "/entrypoint.py", line 57, in <module>
    exec_app(['/opt/java/openjdk/bin/java', JAVA_OPTS, '-jar', f'{BAMBOO_AGENT_INSTALL_DIR}/atlassian-bamboo-agent-installer.jar'] + AGENT_OPTS, BAMBOO_AGENT_HOME, name='Bamboo Agent', env_cleanup=True)
  File "/entrypoint_helpers.py", line 138, in exec_app
    write_pidfile()
  File "/entrypoint_helpers.py", line 120, in write_pidfile
    with open(pidfile, 'wt', encoding='utf-8') as fd:
PermissionError: [Errno 13] Permission denied: '/var/atlassian/application-data/bamboo-agent/docker-app.pid'

Cause

Openshift Kubernetes platform ships 8 redefined security context constraints to limit access on a running pod to the host environment. Security Context constraints (SCC) provides a mechanism for Openshift administrators to control permission of pods. Those permissions include, using host filesystem, using privileged port numbers, changing user ID and others.

oc get pod <podname> -o yaml | oc adm policy scc-subject-review -f -
RESOURCE                        ALLOWED BY
Pod/eph-ephe-job1-23-ygjwppdo   restricted-v2

The pod in the error state can be reviewed using the command above to list the SCC that is preventing the application from running. The restricted-v2 SCC runs the container using a random user ID.

In this particular issue, the application tries to write to the /var/atlassian/application-data/bamboo-agent directory, this required a root user's permission. Running an application with the ANYUID SCC will help overcome this hinderance. The service account used here does not have the required SCC privileges.

Solution

The fix for this issue has been released. Please download the new image released after Feb 1st 2024 from https://hub.docker.com/r/atlassian/bamboo-agent-base

Assign the required ANYUID SCC privilege to the service account.

oc adm policy add-scc-to-user anyuid -z bamboo-ephemeral  

Now you can attempt restarting the build.

Last modified on Feb 22, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.