Gerrit Code Review

A Gerrit Code Review site contains data that needs to be backed up regularly. This document describes best practices for backing up review data.

Data which must be backed up

Git repositories: The bare Git repositories managed by Gerrit are typically stored in the ${SITE}/git directory. However, the locations can be customized in ${site}/etc/gerrit.config. They contain the history of the respective projects, and since 2.15 if you are using NoteDb, and for 3.0 and newer, also change and review metadata, user accounts and groups.
SQL database: Gerrit releases in the 2.x series store some data in the database you have chosen when installing Gerrit. If you are using 2.16 and have migrated to NoteDb only the schema version is stored in the database.

If you are using h2 you need to backup the .db files in the folder ${SITE}/db.

For all other database types refer to their backup documentation.

Gerrit release 3.0 and newer store all primary data in NoteDb inside the git repositories of the Gerrit site. Only the review flag marking in the UI when you have reviewed a changed file is stored in a relational database. If you are using h2 this database is named account_patch_reviews.h2.db.

Data optional to be backed up

Search index: The Lucene search index is stored in the ${SITE}/index folder. It can be recomputed from primary data in the git repositories but reindexing may take a long time hence backing up the index makes sense for production installations.

If you have chosen to use Elastic Search for indexing, refer to its backup documentation.

Caches: Gerrit uses many caches which populate automatically. Some of the caches are persisted in the directory ${SITE}/cache to retain the cached data across restarts. Since repopulating persistent caches takes time and server resources it makes sense to include them in backups to avoid unnecessary higher load and degraded performance when a Gerrit site has been restored from backup and caches need to be repopulated.

Configuration

Gerrit configuration files are located in the directory ${SITE}/etc and should be backed up or versioned in a git repository. The etc directory also contains secrets which should be handled separately

secure.config contains passwords and auth.registerEmailPrivateKey
public and private SSH host keys

You may consider to use the secure-config plugin to encrypt these secrets.

Plugin Data: The ${SITE}/data/ directory is used by plugins storing data like e.g. the delete-project and the replication plugin.

Libraries: The ${SITE}/lib/ directory contains libraries used as statically loaded plugin or providing additional dependencies needed by Gerrit plugins.

Plugins: The ${SITE}/plugins/ directory contains the installed Gerrit plugins.

Static Resources: The ${SITE}/static/ directory contains static resources used to customize the Gerrit UI and email templates.

Logs: The ${SITE}/logs/ directory contains Gerrit server log files. Logs can still be written when the server is in read-only mode.

Consistent backups

There are several ways to ensure consistency when backing up primary data.

Filesystem snapshots

Gerrit 3.0 or newer

all primary data is stored in git
Use a file system like lvm, zfs, btrfs or nfs supporting snapshots. Create a snapshot and then archive the snapshot.

Gerrit 2.x

Gerrit 2.16 can use NoteDb to store almost all this data which simplifies creating backups since consistency between database and git repositories is no longer critical. If you migrated to NoteDb you can follow the backup procedure for 3.0 and higher and additionally take a backup of the database, which only contains the schema version, hence consistency between git and database is no longer critical since the schema version only changes during upgrade. If you didn’t migrate to NoteDb then follow the backup procedure for older 2.x Gerrit versions.

Older 2.x Gerrit versions store change meta data, review comments, votes, accounts and group information in a SQL database. Creating consistent backups where git repositories and the data stored in the database are backed up consistently requires to turn the server read-only or to shut it down while creating the backup since there is no integrated transaction handling between git repositories and the SQL database. Also crons and currently running cron jobs (e.g. repacking repositories) which affect the repositories may need to be shut down. Use a file system supporting snapshots to keep the period where the gerrit server is read-only or down as short as possible.

Turn primary server read-only for backup

Make the primary server handling write operations read-only before taking the backup. This means read-access is still available from replica servers during backup, because only write operations have to be stopped to ensure consistency. This can be implemented using the readonly plugin.

Replicate data for backup

Replicating the git repositories can backup the most critical repository data but does not backup repository meta-data such as the project description file, ref-logs, git configs, and alternate configs.

Replicate all git repositories to another file system using git clone --mirror, or the replication plugin or the pull-replication plugin. Best you use a filesystem supporting snapshots to create a backup archive of such a replica.

For 2.x Gerrit versions also set up a database replica for the data stored in the SQL database. If you are using 2.16 and migrated to NoteDb you may consider to skip setting up a database replica, instead take a backup of the database which only contains the current schema version in this case. In addition you need to ensure that no write operations are in flight before you take the replica offline. Otherwise the database backup might be inconsistent with the backup of the git repositories.

Do not skip backing up the replica, the replica alone IS NOT a backup. Imagine someone deleted a project by mistake and this deletion got replicated. Replication of repository deletions can be switched off using the server option remote.NAME.replicateProjectDeletions.

If you are using Gerrit replica to offload read traffic you can use one of these replica for creating backups.

Take primary server offline for backup

Shut down the primary server handling write operations before taking a backup. This is simple but means downtime for the users. Also crons and currently running cron jobs (e.g. repacking repositories) which affect the repositories may need to be shut down.

Backup methods

Filesystem snapshots

Filesystems supporting copy on write snapshots: Use a file system supporting copy-on-write snapshots like btrfs or zfs.
Other filesystems supporting snapshots: lvm or nfs.

Create a snapshot and then archive the snapshot to another storage.

While snapshots are great for creating high quality backups quickly, they are not ideal as a format for storing backup data. Snapshots typically depend and reside on the same storage infrastructure as the original disk images. Therefore, it’s crucial that you archive these snapshots and store them elsewhere.
3.0 or newer: Snapshot the complete site directory
2.x: Similar, but the data of the database should be stored on the very same volume on the same machine, so that the snapshot is taken atomically over both the git data and the database data. Because everything should be ACID, it can safely crash-recover - as if the power has been plugged and the server got booted up again. (Actually more safe than that, because the filesystem knows about taking the snapshot, and also about the pending writes it can sync.)

In addition to that, using filesystem snapshots allows to:

easy and fast roll back without having to access remote backup data (e.g. to restore accidental rm -rf git/ back in seconds).
incremental transfer of consistent snapshots
save a lot of data while still keeping multiple "known consistent states"

Other backup methods

To ensure consistent backups these backup methods require to turn the server into read-only mode while a backup is running.

create an archive like tar.gz to backup the site
rsync
plain old cp

Test backups

Test backups and fire drill restoring backups to ensure the backups aren’t corrupt or incomplete and you can restore a backup quickly.

Disaster recovery

Replicate backup archives

To enable disaster recovery at least replicate backup archives to another data center. And fire drill restoring a new site using the backup.

Multi-site setup

Use the multi-site plugin to install Gerrit with multiple sites installed in different datacenters across different regions. This ensures that in case of a severe problem with one of the sites, the other sites can still serve your repositories.

Part of Gerrit Code Review

Gerrit Code Review - Backup

Data which must be backed up

Data optional to be backed up

Consistent backups

Filesystem snapshots

Turn primary server read-only for backup

Replicate data for backup

Take primary server offline for backup

Backup methods

Filesystem snapshots

Other backup methods

Test backups

Disaster recovery

Replicate backup archives

Multi-site setup