MySQL Collation Repair: Case Study - Repairing a Production Database
This document is part of the guide on How to Fix the Collation and Character Set of a MySQL Database. Please refer to that page for more information.
Table of Contents
- MySQL Collation Repair: Database Level Changes
- MySQL Collation Repair: Table Level Changes
- MySQL Collation Repair: Column Level Changes
- MySQL Collation Repair: Column Level Encoding Issues
- MySQL Collation Repair: Case Study - Repairing a Production Database
This guide was written for Confluence 5.1, so may not be accurate for current Confluence versions, but it does provide a useful conceptual overview of how to repair a production database.
In this Document
Before Proceeding
Before proceeding, ensure that you:
- Have shut down Confluence
- Have completed a full database backup
You may also wish to apply these changes in a test environment before applying them to production.
Introduction
This guide will set up a Confluence 3.5 instance against a MySQL Database with the incorrect collation and character set. We'll go through the process of Upgrading Confluence to the most recent version (at time of writing, version 5.6) while maintaining the integrity of data in the database. This should hopefully give system Administrators more context on how to run the queries provided throughout the How to Fix the Collation and Character Set of a MySQL Database guide.
Goals
We'll create a new installation of Confluence 3.5.17 against a broken MySQL database. Additionally, we'll also create some content with international character that can cause problems during upgrades (due to the character set used):
- A user with an accented character - José
- A page with the title "Let's have a ¡Celebración!"
Throughout the upgrade process, we'll remedy the errors in the upgrade process that refer to the database configuration, as well as ensuring the content is correct.
Installing Confluence 3.5.17
Creating the Database
First, we'll create a database with the incorrect character set and encoding:
CREATE DATABASE `conf-broken` CHARACTER SET = 'latin1' COLLATE = 'latin1_swedish_ci';
Installing Confluence
Confluence 3.5.17 can be found at the Confluence Download Archives. Refer to Installing Confluence Standalone for more information. Once Confluence is ready, access it at http://localhost:8080/
Things to Note:
- The MySQL driver is included in this version of Confluence.
- Ensure that you select "Production Installation", and that you use a JDBC connection.
- If you require a license, you can either generate an Evaluation License or a Developer License.
Creating Content
Once you've connected to the Database, Confluence will ask you if you'd like to import any content. Select "Example Site >>" so we have a few pages to work with.
Setting up the System Administrator
Confluence will then ask you for account details for your Admin user. Let's create the user José, with a password and email of your choosing. Ensure that the user name and display name are both set as José:
Creating a new page
Now that you've set up your admin account, it's time to create a page within Confluence:
- Click the "Add" button, and select "Page"
- Set the title and the content of the page to "Let's have a ¡Celebración!"
- Click "Save"
Verifying the contents of the database
At this point, let's take a moment to verify that the contents of the database are stored correctly - we'll want to verify this later. Run the following query against your database:
SELECT user_name FROM cwd_user;
You should see that José is displayed correctly in the database - there are no incorrect or invalid characters. Despite using the incorrect collation and character set, the accented character displays correctly.
This is due to two reasons:
- While the accented character is available, not all characters are available in the
latin1
character set. Most characters are available inutf8
. - Different collations sort and behave differently. To ensure a consistent experience, we recommend the use of a single collation -
utf8_bin
Upgrading Confluence to version 5.0.3
The next step in upgrading Confluence is to move to 5.0.3. This version ensures that we get all the necessary upgrade tasks run before moving to newer versions of Confluence, as some of the upgrade tasks are removed. As before, you can download Confluence 5.0.3 from the Download Archives. Please see Upgrading Confluence for more information.
It's also important to ensure that your database is configured correctly, in accordance with the Database Setup for MySQL for your version of Confluence. After reading the setup notes, it's clear that our Database isn't in UTF8 format as required by the documentation. Let's fix that:
ALTER DATABASE `conf-broken` CHARACTER SET = 'utf8' COLLATE = 'utf8_bin';
Once you've changed the database collation, proceed with Upgrading Confluence to 5.0.3. After the upgrade has completed, check that your page still displays correctly:
Upgrading Confluence to version 5.6.3
Because Confluence 5.6.3 doesn't come with a driver for MySQL, you'll need to obtain the correct driver from the MySQL Website. Please see Database JDBC Drivers for more information.
Now, let's attempt to visit Confluence. Depending on your specific environment, you may see one of two errors.
If you were logged in before the upgrade
Confluence will attempt to log you in. However, you might receive an error similar to the following:
If you weren't logged into Confluence
You'll see a bootstrap failure, similar to the following:
While the failures in both scenarios are completely different, they're caused by the same issue - the collation and character sets for tables in the database are inconsistent
How MySQL Handles Collation and Character Sets
When you create a database, table, or column, you have the ability to specify the character set and collation. If none is explicitly set during creation, the object will use that specified of its parent - or continue along upwards until the explicitly defined character set and collation is found. In this particular case, we created a database with latin1
encoding. In preparation for our upgrade to Confluence 5, we adjusted the collation on the database as suggested by the documentation.
The result was that when new tables were created in Confluence 5, they used the database's character set and encoding - in this case, utf8
. We now have an inconsistent database - some tables use utf8
while others use latin1
.
In Confluence 5.5, we introduced bootstrap checks to ensure that Confluence wouldn't upgrade an inconsistent database, as it can cause problems with incorrectly formatted data, among other things.
Repairing the Database
At this point, we need to ensure that our database is consistent in the character set and collations that it uses. Since utf8 is the recommended character set, we'll be using that.
Since we know our database is using the right collation (we set it earlier), we'll be able to start from MySQL Collation Repair: Table Level Changes. Ensure Confluence has been shut down, and that you have a backup of your database before proceeding.
These results are supplied for reference only, and may differ from your installation
Query | Results |
---|---|
Let's identify the tables that are having problems:
| |
Next, we'll generate our repair script:
Each row is a query we run in bulk. You can export these results to a | |
Now, let's identify any columns that aren't using the correct collation:
| |
Once again, we'll generate our repair script - firstly for
As before, execute each of these queries against your database. | |
We'll also generate our repair script for non
|
At this point, our database is sufficiently repaired, so we can start Confluence. All of the identifying queries should now return zero rows. Let's go ahead and check our sample page:
Optional: Fixing incorrectly encoded content
You may come across content that hasn't been encoded correctly. This is because MySQL can define a character set and collation at the connection level, as well as at the server level. These character set and collation levels are defined in several server variables, and are explained in Appendix A - Character Set and Collation Variables.
Because the location of your incorrectly encoded content may vary, the examples are limited. Please see MySQL Collation Repair: Column Level Encoding Issues for more information about a suitable query to use to re-encode data.
Appendix A: Character Set and Collation variables
character_set_server
and collation_server
:
These are the default character set and collation used by the server - by default, they're set to latin1
and latin1_swedish_ci
, respectively. This means that any database will use these character set and collation, unless specified during the creation process.
character_set_connection
and collation_connection
:
These are the character set and collation that will be used in the connection. If they're not explicitly set, they'll use the same values as defined by character_set_server
and collation_server
respectively.
To determine what settings your server is running, use the following queries:
SHOW VARIABLES LIKE 'collation%';
SHOW VARIABLES LIKE '%char%';
To set these variables permanently, please visit our guide on Configuring Database Character Encoding.