How to get summary statistics from Bitbucket Data Center access logs
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
This page explains how to get summary statistics from Bitbucket Data Center access logs.
Sometimes we need summary statistics information derived from Bitbucket's access log. The script given here accepts raw access logs as its standard input - we can use cat
to concatenate a number of access log files and pipe them to the script, as is shown in the example. The script will write information on:
- Total number of requests processed.
- Total number of bytes read from the clients and written to them.
- Number of requests per request result code, including details on HTTP and SSH codes.
- Per-user statistics, including number of requests, bytes read, and bytes written. Since there are access log entries not associated with any user, they are shown as "NO_USER".
Environment
8.9.9, but also applicable to other versions.
Solution
The script given below can be used for summary analysis of Bitbucket access logs. You can modify it to better suit your needs; modify the cat
argument to include only access logs you actually want to analyze.
Please don't run it directly on production servers since loading and parsing large access log files can add pressure to the production server.
The script below is given as an example, it is provided as-is and only as an example, and Atlassian can't guarantee its correct functionality.
cat atlassian-bitbucket-access*.log | \
awk '
BEGIN {
FS = "|";
}
{
printf("Requests processed: %d total\r", record_count) > "/dev/stderr";
}
$3 ~ /o[@*]/ {
# total bytes read
bytes_read += $9;
bytes_written += $10;
# result codes
str = sprintf("%s:%s", $2, $8);
gsub(/ /, "", str);
res_codes[str] ++;
# per-user statistics
user = $4;
gsub(/ /, "", user);
if (user == "-") user = "NO_USER";
user_count[user]++;
user_bytes_read[user] += $9;
user_bytes_written[user] += $10;
# rows count
record_count++;
}
END {
printf("\nTotal bytes read: %d, total bytes written: %d\n", bytes_read, bytes_written);
printf("\nResult code, Count\n");
printf("-----\n");
# sort on res_codes value
PROCINFO["sorted_in"] = "@val_num_desc"
for (rc in res_codes) {
printf("%s, %s\n", rc, res_codes[rc]);
}
printf("\nPer user statistics: user, requests count, bytes read, bytes written\n");
printf("-----\n");
# sort on user_count value
PROCINFO["sorted_in"] = "@val_num_desc"
for (user in user_count) {
printf("%s, %d, %d, %d\n", user, user_count[user], user_bytes_read[user], user_bytes_written[user]);
}
}
'
The script will write out summary statistics, for example:
Requests processed: 86776 total
Total bytes read: 8656, total bytes written: 260948064
Result code, Count
-----
https:200, 86671
https:302, 58
ssh:0, 15
https:204, 13
https:401, 6
https:201, 5
https:404, 4
ssh:1, 3
https:202, 1
https:0, 1
Per user statistics: user, requests count, bytes read, bytes written
-----
NO_USER, 84879, 0, 57416924
john.doe, 1388, 7499, 200871299
jane.doe, 510, 1157, 2659841