Recommended action plan if a repository becomes corrupted in Bitbucket Data Center
Platform notice: Server and Data Center only. This article only applies to Atlassian products on the Server and Data Center platforms.
Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Symptoms
This is a general recommendation/real-life example for situations where the repository on your Bitbucket Data Center instance becomes corrupted for some reason.
Resolution
Unfortunately, there's no "one size fits all" approach here. The best and recommended approach is to contact Atlassian Support at https://support.atlassian.com by lodging a Support request with us.
If you are still curious to know more, here are some of the steps carried out internally by our Bitbucket Data Center development team, when a repository corruption was detected while developing new Bitbucket features. Development was able to recover (without losing any commits) using a mix of two strategies:
- They shut down Bitbucket periodically and zip up their repositories. To reduce the downtime, they just copy the data to a separate location while Bitbucket is down (which is very fast), and then they compress it after Bitbucket has been brought back online. This allows them to backup their repositories (some of which are quite large, for test data) in just a few minutes of downtime (generally 5 minutes or less)
- Every developer's local clone is, in effect, a backup of the Bitbucket repository. Git repositories include the full history of every reachable ref when cloning, making them very useful for "restoring" repositories.
The most complicated restore development had to be done internally (which, again, was accomplished without losing any data), and was started by unpacking the zipped repository to serve as a base. They then used scp to copy the .pack and .idx files from some developers' machines into that unpacked repository. A simple git gc eliminated all the duplicate and corrupt objects and produced a single clean, fully functional repository. They replaced the repository directory in their BITBUCKET_HOME with the rebuilt repository and development continued without missing a beat.
Step-by-step process
This process involves removing/deleting individual files. To avoid data loss, please make sure you've created backups before you begin.
Further, if the corruption stems from hardware failure, power outages, or other similar incidents, the recommended approach is to restore the file system and database to a point before the incident. For such system-wide issues like those mentioned, there is no way to identify the scope of the corruption and you could run into an issue later down the line if you choose to repair individually corrupted files. Restoring is the safest method.
1) Determine the repo-id
- First Determine the numeric ID and file location of the repository that is corrupt. This will appear in Git client output as well as in the repository's settings page for the repository.
> git push origin my-branch Counting objects: 6, done. Delta compression using up to 8 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 664 bytes | 0 bytes/s, done. Total 6 (delta 2), reused 0 (delta 0) remote: error: object file /opt/bitbucket-home/shared/data/repositories/941/./objects/incoming-359Foi/0e/9fd1040b4080bb2986f2c4788d5cf04509fe2f is empty remote: fatal: loose object 0e9fd1040b4080bb2986f2c4788d5cf04509fe2f (stored in /opt/bitbucket-home/shared/data/repositories/941/./objects/incoming-359Foi/0e/9fd1040b4080bb2986f2c4788d5cf04509fe2f) is corrupt error: object file /opt/bitbucket-home/shared/data/repositories/941/./objects/incoming-359Foi/0e/9fd1040b4080bb2986f2c4788d5cf04509fe2f is empty fatal: loose object 0e9fd1040b4080bb2986f2c4788d5cf04509fe2f (stored in /opt/bitbucket-home/shared/data/repositories/941/./objects/incoming-359Foi/0e/9fd1040b4080bb2986f2c4788d5cf04509fe2f) is corrupt To ssh://bitbucket-server.my-company.com:7997/proj/repo.git ! [remote rejected] my-branch -> my-branch (missing necessary objects) error: failed to push some refs to 'ssh://git@bitbucket-server.my-company.com:7997/proj/repo.git'
In the output above you can see the repository's file location is /opt/bitbucket-home/shared/data/repositories/941/ - Additionally if you only have the repository name you can find its repo-id by navigating to the repository settings on the UI as shown below:
2) Verifies the connectivity and validity of the objects using 'git fsck'
- Open a Terminal on your Bitbucket instance (perhaps via SSH) and
cd
to the repository data directorycd /opt/bitbucket-home/shared/data/repositories/941/
- Switch to bitbucket_user and run
git fsck --no-dangling
and inspect the output. If pushes are still modifying the repository as you work, this might show transient failures. Before actually modifying the repository, make sure that the failures you see are old and not part of an in-progress push. Concurrent operations will cause transient errors in git fsck that you might mistake for real corruption.user@server:/opt/bitbucket-home/shared/data/repositories/941$ sudo su bitbucket_user bitbucket_user@server:/opt/bitbucket-home/shared/data/repositories/941$ git fsck --no-dangling
3) Empty objects
- Below is a sample output showing empty objects
# git fsck --no-dangling error: object file ./objects/de/c2cbe1fa34fe8d08aa4031b70da63b6399cc3f is empty error: unable to mmap ./objects/de/c2cbe1fa34fe8d08aa4031b70da63b6399cc3f: No such file or directory error: dec2cbe1fa34fe8d08aa4031b70da63b6399cc3f: object corrupt or missing: ./objects/de/c2cbe1fa34fe8d08aa4031b70da63b6399cc3f
- Run
ls -lh ./objects/de/c2cbe1fa34fe8d08aa4031b70da63b6399cc3f
to ensure that- The object really is empty with no content
- The timestamp isn't close to "now" (i.e. run
date
and compare). If it's close to now or not empty, it might be coming from an in-progress push, so don't remove it!
- If the object is old and still empty, move it out of the repository's data directory or just delete it.
- Repeat step 2 (
git fsck --no-dangling
) to check for more problems.
4) Recovering missing objects
- Typically, after removing empty objects, if they are still referenced by a ref, running
git fsck --no-dangling
will report the same objects as broken links and missing (possibly among other objects).# git fsck --no-dangling Checking object directories: 100% (256/256), done. broken link from tree a49126b43f1d96af45c6b79bc1c4f38723d447d1 to blob dec2cbe1fa34fe8d08aa4031b70da63b6399cc3f broken link from tree 8025517d305dd5acdc7d780b329ffda284dbfbbb to tree ab162a6bf206008e4487720835a41ff562fafc99 missing blob dec2cbe1fa34fe8d08aa4031b70da63b6399cc3f missing tree ab162a6bf206008e4487720835a41ff562fafc99 Verifying commits in commit graph: 100% (1/1), done. Verifying commits in commit graph: 100% (3/3), done.
- It's possible the missing object is part of an in-progress push. Try running a
cat-file
on it to see if it is still missing a little later.# git cat-file -p ab162a6bf206008e4487720835a41ff562fafc99 fatal: Not a valid object name ab162a6bf206008e4487720835a41ff562fafc99
- If it is still missing, you'll have to find that object from a working copy of the repository. This can be a local clone with any of the developers who has recently pushed to the repository, a recent backup/snapshot of the filesyetm or even mirrors that would hold a copy of the repo. You can verify if the object exists on the repo_copy using the same
git cat-file
command:% git cat-file -p ab162a6bf206008e4487720835a41ff562fafc99 100644 blob 0cfbf08886fca9a91cb753ec8734c84fcbe52c9f frssrgv.txt
- "Recovering" those objects is accomplished by uploading the missing objects to the server, which can be done by uploading individual loose objects or, for larger numbers of missing objects, by uploading packs. If a pack is uploaded, any duplication in objects in that pack and existing packs can be resolved by a simple
git gc
, which will consolidate objects into a new pack and remove old ones.Git stores objects in folders based on the first two characters of their hash, so ab162a6bf206008e4487720835a41ff562fafc99 should be copied into
./objects/ab/162a6bf206008e4487720835a41ff562fafc99
. Double check the file permissions and usegit cat-file
to check that Git can recognise the object.user@server:/opt/bitbucket-home/shared/data/repositories/941$ sudo -u bitbucket_user cp /tmp/162a6bf206008e4487720835a41ff562fafc99 objects/ab user@server:/opt/bitbucket-home/shared/data/repositories/941$ sudo -u bitbucket_user ls -lh objects/ab/162a6bf206008e4487720835a41ff562fafc99 -rw-r--r-- 1 bitbucket_user bitbucket_user 626 Feb 1 06:42 objects/ab/def791d6d4b90cbc02cae03d7bbdd390458103 user@server:/opt/bitbucket-home/shared/data/repositories/941$ sudo -u bitbucket_user git cat-file -t ab162a6bf206008e4487720835a41ff562fafc99 tree
In case the Missing or Corrupt Git objects are packed on the backup repo, refer to the steps mentioned in "How to Unpack a Git Pack File" to extract a Git object that is in a
.pack
file.
- Repeat step 2 (
git fsck --no-dangling
) to check for more problems.- clean/successful output
- You're done! The repository is no longer corrupt.
5) Missing objects not found anywhere
There may be instances where recovering lost or corrupted objects is not feasible or possible. This is particularly likely in cases involving recent merges (from a fork) or file updates made through the Bitbucket UI. In such cases it is useful to first identify the branch to which the missing objects belong.
- In order to identify the problematic branch use the
--name-objects
option with thegit fsck
command on the bare repo. Below is the description from git fsck documentation.When displaying names of reachable objects, in addition to the SHA-1 also display a name that describes how they are reachable, compatible with git-rev-parse[1], e.g.
HEAD@{1234567890}~25^2:src/
. - Below is an example output from the same command executed on the Bitbucket bare repo. The output clearly shows that all the broken links originate from the ref refs/heads/nbranch.
# git fsck --no-dangling --name-objects Checking object directories: 100% (256/256), done. broken link from tree a49126b43f1d96af45c6b79bc1c4f38723d447d1 (refs/heads/nbranch~2:) to blob dec2cbe1fa34fe8d08aa4031b70da63b6399cc3f (refs/heads/nbranch~2:new.txt) broken link from tree 8025517d305dd5acdc7d780b329ffda284dbfbbb (refs/heads/nbranch:new/) to tree ab162a6bf206008e4487720835a41ff562fafc99 (refs/heads/nbranch:new/afsds/) missing blob dec2cbe1fa34fe8d08aa4031b70da63b6399cc3f (refs/heads/nbranch~2:new.txt) missing tree ab162a6bf206008e4487720835a41ff562fafc99 (refs/heads/nbranch:new/afsds/) Verifying commits in commit graph: 100% (1/1), done. Verifying commits in commit graph: 100% (3/3), done.
- After identifying the branch, review the Push logs to determine the most recent operation performed on it. In one instance, it was discovered that the last operation was a merge from a forked repository. The missing objects were then located in the source repository from which the merge originated, and copying these objects resolved the issue.
- Consequently, if the last operation on that branch is an update made through the UI, the object might be irrecoverably lost. In such cases, developers can explore the option of deleting the branch and re-creating or re-pushing it from an earlier commit with minimal data loss.
- In any case do verify by executing
git fsck --no-dangling --name-objects
again to ensure that fix worked and the repo is no longer corrupted.
If the above method to identify branch does not work due to any reason you can try creating script that iterates through all refs in the repository and executes 'git ls-tree -r' on the commits identified at the tips of these references and 'git cat-file -e' on blobs within those trees.
#!/bin/bash
# Iterate over all the output of git show-ref
git show-ref | while read commit_hash ref; do
echo "Processing ref: $ref"
# Check if git ls-tree -r "$commit_hash" executes successfully
if ! git ls-tree -r "$commit_hash" &>/dev/null; then
echo "Error: git ls-tree failed for ref $ref. One or more tree objects might be corrupted in the ref tip"
continue
fi
# Iterate over the output of git ls-tree and store values in mode, type, hash, and rest
git ls-tree -r "$commit_hash" | while read mode type hash rest; do
# Only verify blobs for integrity
if [ "$type" == "blob" ]; then
# Verify the integrity of each blob/hash
if ! git cat-file -e "$hash" 2>/dev/null; then
echo "Error: Integrity check failed for the blob $hash in ref $ref"
fi
fi
done
done
If a branch commit is referring to a missing/corrupted object, the given git commands would fail and that would help identify problematic branches.
# ./script.sh
Processing ref: refs/heads/branch1
Processing ref: refs/heads/master
Processing ref: refs/heads/nbranch
Error: git ls-tree failed for ref refs/heads/nbranch. One or more tree objects might be corrupted in the ref tip
This above script only identifies problems at the current commit to which a branch points. It does not examine the entire history of the branch, so issues deeper in the commit history will not be detected.
Please be aware the script provided is not supported by Atlassian. Users are encouraged to use these scripts as references and customize and test them independently. Atlassian does not offer support for updating or maintaining these scripts.