Testing NFS disk access speed for Bitbucket Data Center and git operations

Still need help?

The Atlassian Community is here for you.

Ask the community

Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Except Fisheye and Crucible

Problem

Bitbucket Data Center appears to be experiencing performance issues and is running slowly. Git and UI operations are taking longer than expected.


Important note on testing

  • The guidelines provided here are a starting point for you to build your staging environment ahead of testing

  • These guidelines are not a substitute for extensive testing prior to deployment using realistic loads that reflect your individual requirements

  • It is not practical or possible to provide simple guidelines for capacity planning and this will always require extensive performance testing

  • The complexity of environments will vary greatly and there are many other variables outside of NFS that can affect performance (e.g. networking, instance types, size of instances, workload) and for this reason, it is essential that you perform extensive testing for your specific environment

  • Atlassian support does not provide consulting services on capacity planning. You can contact an Atlassian partner if you require consulting services for capacity planning

If you are running Bitbucket in AWS please refer to Infrastructure recommendations for enterprise Bitbucket instances on AWS article instead. Due to the complex architecture implementations, availability zones and regions also impact the benchmark results possibly causing false positives.

Diagnosis

Disk access speed is critical for Bitbucket and Git operations performance, especially when running a multi-node cluster where a NFS share is required to store data.

If the application is running slowly, disk speed on the NFS share can be a potential root cause. It's possible to isolate that cause using Bonnie++ to benchmark disk access speed.

  1. Install Bonnie++ using the OS package manager or download it from https://doc.coker.com.au/projects/bonnie/

    About Bonnie++

    Bonnie++ is a file system benchmarking tool that allows you to easily execute a test. It does a good job of performing similar operations that mirror git operations to give you a more accurate test. Bonnie++ tests two different things - I/O throughput and creating/reading/deleting lots of small files (similar to what git does).

  2. Execute Bonnie++

    bonnie++ -d /path/to/remote/nfs/filesystem -r 65536 -u someuser -z 1234 -n 1024
    This test must be executed on one of the Bitbucket data center application 
    nodes against the shared home directory, which is mounted to the NFS
    SwitchesDescription

    -d /path/to/remote/nfs/filesystem

    directory to remote NFS for the test

    -r 65536

    RAM size in megabytes

    -u someuser

    user to use for the test

    -z 1234

    Random number seed to get repeatable tests

    -n 1024

    Number of files for the file creation test

    The file system should have 3x the disk space relative to RAM for the test

Interpreting the results

Your shared NFS storage layer is a critical piece of your data center infrastructure. You must make sure you have sufficient I/O performance.

The important values are under the Random Create - Create and Random Create - Read sections on the output below.

Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ip-10-217-3-33 128G  1302  99 426371  22 208657  13   648  98 428862  14  9522 100
Latency              6237us     126ms    4654ms    3273ms     307ms    4146us
Version  1.97       ------Sequential Create------ --------Random Create--------
ip-10-217-3-33      -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
               1024   361   1 120910  58  5807   9   366   1 12437  19  3441   4
Latency               239ms    8497us     658ms     190ms     204ms     838ms
Bonnie++ output is one comma-separated string instead of the table

On some versions of bonnie++ the output of the test is written into one line instead of the table.

bonnie++ -d /tmp -s 4G -n 0 -m TEST -f -b -u username

Using uid:1000, gid:1000.
Writing intelligently...done
Rewriting...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
TEST             4G           374271  39 214541  19           392015  17 +++++ +++
Latency                         167ms   89125us             52047us    4766us

1.96,1.96,TEST,1,1327339401,4G,,,,374271,39,214541,19,,,392015,15,+++++,+++,,,,,,,,,,,,,,,,,,,167ms,89125us,,52047us,4766us,,,,,,

In order to have a readable output, you can use the bundled conversion tool bon_csv2html to convert it to an HTML page with the table.

echo "[1.96,1.96,TEST,1,1327339401,4G,,,,374271,39,214541,19,,,392015,15,+++++,+++,,,,,,,,,,,,,,,,,,,167ms,89125us,,52047us,4766us,,,,,,]" | bon_csv2html > output.html

As you can see above you need to copy only the numbers line without the header.


Targets

(higher is better)

Random Create - Create

Random Create - Read

Minimum recommended

300

10,000

Results from above output

366

12,437

Results lower than the ones in the benchmark above indicate that I/O performance on the NFS is not optimal for Bitbucket and Git operations, leading to performance issues.

Possible Causes

Examples of environmental factors that can cause slow disk access are as follows:

  • Anti-Virus software running on the cluster or on the NFS server scanning Bitbucket structure.
  • Network latency.
  • A disk defragmentation job may be running.
  • Hardware issues such as disk failures.
  • File system encryption turned on.
  • Automated compression of files controlled by the OS.
  • Specific issues with the Java version and OS. This is a rare occurrence, however, a bug or known issue within the JVM may cause it to perform poorly on a specific OS.
  • Other applications or operations that are currently using the disk.
  • The disk capacity may be nearing full, which on some OS can slow the performance of the disk (in this particular example, it was on Solaris).
  • File server running out of server processes.
  • Not having the recommended NFS mount options

    rw,nfsvers=3,lookupcache=pos,noatime,intr,rsize=32768,wsize=32768,_netdev

Additional Resources About Disk Benchmark

Here are some additional resources a sysadmin would like to review.




Last modified on Apr 23, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.