Setting up AWS S3 storage for Git LFS

Bitbucket by default stores Git LFS objects in the shared-home filesystem. However also supports the ability to store and serve Git LFS object from AWS S3. This allows storage and request load to be offloaded to an AWS S3 bucket, decreasing the load directed at Bitbucket itself and enabling greater scalability.

This document describes:

  • Configuring Bitbucket to store Git LFS objects in AWS S3

  • Migrating existing Git LFS objects from the shared-home filesystem to AWS S3

Before you start

The following is important to know before you start using S3 for storage of LFS objects:

  • Clients must have network connectivity to AWS - Bitbucket will provide pre-signed URLs to the Git LFS client and then the Git LFS client will make a HTTP GET (download) or PUT (upload) request directly to AWS.

  • AWS S3 has maximum file size restrictions - Bitbucket doesn’t have maximum file size restrictions, other than those implied by available disk space. However, AWS S3 does have maximum file size restrictions. Make sure you review these and can accept them before proceeding.

  • Understand how you’ll backup the S3 backup - Your existing backup strategy must be updated to backup Git LFS objects stored in S3. Be sure to understand how to backup an S3 bucket and incorporate this into your backup strategy. Zero-downtime backup is supported, and is robust thanks to the fact that Git LFS objects are immutable and upload is atomic.

  • Bitbucket Mirrors do not support storage of LFS objects in S3 - You may use Bitbucket mirrors while the upstream is configured to use AWS S3 for storage of Git LFS objects, however mirrors still act as effectively a caching proxy server for LFS objects. That is, when a Git LFS client attempts to download an LFS object from the mirror it will be served from the mirror of the object is already present on the mirror. If it is not then the mirror will contact the upstream, it will be provided a URL to download the LFS object from the AWS S3 bucket. It will then use this URL to download the LFS object, simultaneously streaming the content to the client that requested the object and to the local filesystem such that it can directly serve the next client request for the same object.

  • The S3 bucket should be dedicated to Bitbucket - To avoid key name conflicts the S3 bucket created as part of the below process should be dedicated to Bitbucket and not contain objects from other applications.

Migration overview

This section documents not just configuring Bitbucket to store Git LFS objects in AWS S3, but also migrating existing objects from the shared-home filesystem to S3. If you are configuring a new Bitbucket instance to store LFS objects in AWS S3 it is not necessary to use the migration tool, since it’s only function is copying objects. The same is true if you have an existing Bitbucket instance but have no LFS objects stored. In this case the simpler process is described further below in the FAQ section. However, where migrating existing objects is required, read on.

Migrating from the embedded LFS object store that stores LFS objects in the shared-home filesystem involves the following steps:

  • Pre-migration

    • Setup S3 bucket

    • Perform an initial sync using the Bitbucket Data Center Git LFS S3 Store Migration Tool

  • Migration

    • Shutdown Bitbucket

    • Perform an incremental sync using the Bitbucket Data Center Git LFS S3 Store Migration Tool

    • Update Bitbucket’s configuration to indicate AWS S3 is used to host Git LFS objects

    • Startup Bitbucket

  • Post-migration

    • (Optional) Delete LFS objects from the shared-home filesystem

Pre-migration

In this pre-migration phase an S3 bucket should be created and an initial copy of the Git LFS object stored in the Bitbucket shared home directory is carried out. Depending the total size of object stored this initial copy may take anywhere between a few minutes to a few hours. It can be performed well in advance of the final migration step because later an incremental migration will be performed.

  • Follow the AWS instructions to create an S3 bucket named bitbucket-object-store

  • Create an AWS user that has been granted the following permissions on the S3 bucket

    • PutObject

    • GetObject

    • GetObjectAttributes

    • DeleteObject

    • ListBucket

  • Create an access key for the user

With the S3 bucket the Bitbucket Data Center Git LFS S3 Store Migration Tool can now be used to perform an initial copy/synchronization:

  • Download the latest version of the migration tool from https://github.com/atlassian-labs/bitbucket-s3-lfs-migration-tool/releases

  • Copy the file to a location on one of the nodes that run Bitbucket Data Center. In the example below we’ll assume it has been copied to the home directory of the user that runs the Bitbucket application.

  • Create the migration tool configuration file. In the example below we’ll assume it is named config.properties and has been also been created in the home directory of the user that runs the Bitbucket application. The file shall have the following contents:
    bitbucket.home=/var/atlassian/application-data/bitbucket s3.bucket=bitbucket-object-store
    s3.region=us-east-1 s3.access-key=<access key>
    s3.secret-key=<access key>

  • Ensure the properties are set appropriately. For details see: https://github.com/atlassian-labs/bitbucket-lfs-s3-migration-tool/blob/master/README.md

  • Run the migration tool to perform the initial copy:
    java -jar ~/bitbucket-lfs-s3-migration-tool-1.0.0.jar ~/config.properties

  • The migration tool, using the above command, can be run between the initial sync and the actual cutover to S3. Doing so can reduce the time it takes to run during the final migration (i.e. the final incremental sync that is run when the Bitbucket application is stopped).

Migration

Shutdown Bitbucket

Shutdown all nodes of your Bitbucket instance so as to perform the final incremental synchronisation.

Perform final LFS object synchronisation

Using the same Bitbucket Data Center Git LFS S3 Store Migration Tool and its configuration, perform a synchronisation again. This will migrate LFS objects from that were uploaded after the migration tool was run in the “pre migration” section above. It can be run again with the same configuration and command:

java -jar ~/bitbucket-lfs-s3-migration-tool-1.0.0.jar ~/config.properties

Update configuration in bitbucket.properties

Set the following properties in $BITBUCKET_HOME/shared/bitbucket.properties

  • bitbucket.filestore

  • plugin.bitbucket-filestore-s3.bucket

  • plugin.bitbucket-filestore-s3.region

  • plugin.bitbucket-filestore-s3.access-key

  • plugin.bitbucket-filestore-s3.secret-key

The above properties are all documented in the Configuration properties document.

An example of setting the above in bitbucket.properties might look like this:

# Configure Bitbucket to store LFS objects in AWS S3
bitbucket.filestore=s3
plugin.bitbucket-filestore-s3.bucket=bitbucket-object-store
plugin.bitbucket-filestore-s3.region=us-east-1
plugin.bitbucket-filestore-s3.access-key=<access key>
plugin.bitbucket-filestore-s3.secret-key=<access key>

Start Bitbucket

When the above synchronisation has completed and the new configuration has been added to bitbucket.properties, Bitbucket can be restarted.

Post-migration

Once it has been confirmed that all LFS objects have been migrated to S3, and Bitbucket is successfully serving these objects it is possible to delete the store of LFS objects in the shared-home filesystem. It is recommended to do this some days or even weeks after migration, after ensuring the migration has been a success. A recommended first step is to rename the store directory first, just to ensure those files aren’t being used. For example, assuming the environment variable $BITBUCKET_HOME points to your Bitbucket home directory:

cd $BITBUCKET_HOME/shared/data/git-lfs/storage
mv storage unused-storage

And then some time later actually delete it:

cd $BITBUCKET_HOME/shared/data/git-lfs/storage
rm -rf unused-storage

Frequently asked questions

What if I don’t need to migrate existing LFS objects?

If you wish to configure a new Bitbucket instance to store LFS objects in AWS S3 it is not necessary to use the migration tool, since it’s only function is copying objects. The same is true, if you have an existing Bitbucket instance but have no LFS objects stored. In this case $BITBUCKET_HOME/data/git-lfs/storage would be empty on non-existent.

If migration can be omitted then configuring Bitbucket to store LFS objects in AWS S3 is as simple as setting the following properties in $BITBUCKET_HOME/shared/bitbucket.properties and then restarting the Bitbucket instance for the changes to take effect:

  • bitbucket.filestore

  • plugin.bitbucket-filestore-s3.bucket

  • plugin.bitbucket-filestore-s3.region

  • plugin.bitbucket-filestore-s3.access-key

  • plugin.bitbucket-filestore-s3.secret-key

The above properties are all documented in the Configuration properties document.

An example of setting the above in bitbucket.properties might look like this:

# Configure Bitbucket to store LFS objects in AWS S3
bitbucket.filestore=s3
plugin.bitbucket-filestore-s3.bucket=bitbucket-object-store
plugin.bitbucket-filestore-s3.region=us-east-1
plugin.bitbucket-filestore-s3.access-key=<access key>
plugin.bitbucket-filestore-s3.secret-key=<secret key>

Do Git LFS clients require direct connectivity to AWS?

Yes. When Bitbucket is configured to store LFS objects in AWS S3 Bitbucket will provide upload and download URLs to the Git LFS client. The client will then, via HTTP, upload/download directly to S3.

To take a practical example, if a client wants to fetch Git LFS objects it will make a request to Bitbucket via the Git LFS Batch API requesting download URLs for one or more Git LFS objects. Bitbucket will respond with a list of download URLs which are pre-signed and permit the client to download the specific objects that were requested (and nothing more). The Git LFS client will then make a HTTP GET request to AWS S3 to download the objects.

How do I secure the AWS secret in the bitbucket.properties file?

The bitbucket.properties file should always have ownership and permissions set to restrict read access to the user that runs Bitbucket. Additionally Bitbucket supports encryption of secrets in bitbucket.propertes, for further information see Secure Bitbucket configuration properties



Last modified on Oct 31, 2024

Was this helpful?

Yes
No
Provide feedback about this article
Powered by Confluence and Scroll Viewport.