NAV Navbar
Switch version:

Monitoring and Switching Over to Secondary Server

Monitoring the progress of the sync

As mentioned in the details part of the “Setup a standby (secondary) GoCD Server” section, the standby dashboard shows the progress of the sync, and refreshes itself every few seconds. An entry showing up in red denotes the sync hasn’t happened, whereas an entry in black denotes that the standby is in sync with the server. You should monitor that the Last Config/Plugins Update Time under Primary Details and Last Successful Sync time under Standby Details are not off by a huge time gap.

If you need it, this information is also available via a JSON API:

http://standby-go-server:port/go/add-on/business-continuity/admin/dashboard.json

The standby GoCD Server dashboard looks like this:

Figure 6: Standby GoCD Server - Dashboard

Disaster strikes - What now?

Switch standby to primary

Suppose the primary GoCD Server goes down, you need to perform the following in order:

  1. Turn off the primary instances

    If the primary Postgres Server and/or the primary GoCD Server are accessible, turn those services off on the corresponding machines.

  2. Turn off Postgres replication

    The details part of the “Setup a standby Postgres instance for replication” section mentions a trigger_file, which is a file which allows the standby Postgres instance to become the primary Postgres instance. Create that file now. For instance:

    touch /path/to/postgresql.trigger.5432
    
  3. Switch standby GoCD Server to primary

    As mentioned in the “You need to know that …” section, the standby GoCD Server needs to be restarted before it can become the primary GoCD Server. While doing this, you need to set the go.server.mode system property to the value primary.

    -Dgo.server.mode=primary
    

    This property was originally mentioned in the details part of “Setup a standby (secondary) GoCD Server” section of the current document. You can also completely remove this property, since the default value is primary.

  4. Switch virtual IP to point to standby GoCD Server

    As mentioned in the details part of “Setup a virtual IP for the agents to use” section, you can now assign the virtual IP to the standby GoCD Server. The command to do that depends on the virtual IP you chose. An example looks like this:

    sudo java -Dinterface=eth0:0 -Dip=192.168.23.23 -Dnetmask=255.255.0.0 -jar "/path/to/go-server/addons/go-business-continuity-VERSION.jar" assign
    

    NOTE: However, if your primary GoCD Server is still up and has control over this virtual IP, this command will fail to assign the virtual IP to the standby GoCD Server. You will need to go to the primary GoCD Server and unassign the virtual IP from it. You’ll need to do this in case you need to switch because the primary Postgres instance went down. Unassignment is very similar to assignment. Remember to do this on the primary GoCD Server. It could look like this:

    sudo java -Dinterface=eth0:0 -Dip=192.168.23.23 -Dnetmask=255.255.0.0 -jar "/path/to/go-server/addons/go-business-continuity-VERSION.jar" unassign
    

Recovery - Back to the primary server

Given that you were able to successfully switch the erstwhile standby GoCD Server to become the primary, and the real primary GoCD Server is back in action, this section talks about what you need to do to get back to the original primary instances. The concern during this recovery is the syncing of the primary and standby Postgres instances. The ancillary concerns are around syncing of config files, trust stores, etc.

Please note that, at this time, this requires downtime. This might change in the future.

The steps are largely the same as that of setting up a standby GoCD Server and Postgres instance.

For the purposes of this section:

The steps are:

  1. Bring down both GoCD Servers, GO1 and GO2.
  2. Unassign the virtual IP from the GO2 box. See the details part of the “Setup a virtual IP for the agents to use” section for more information about this.
  3. Copy over the contents of /etc/go (or at least /etc/go/cruise-config.xml) from GO2 to GO1.
  4. Use pg_basebackup with -X fetch flag to recreate the database on to PG1. This makes sure that all the changes made to the database during the time GO1 was down are brought back to it.

      pg_basebackup -h <ip_address_of_secondary_postgres_server> -U rep -D <empty_data_directory_on_primary> -X fetch
    
  5. In PG2, Postgres would have changed the name of recovery.conf file to recovery.done, to show that PG2 is now acting as primary. Rename that back to recovery.conf, remove the trigger file you created earlier (/path/to/postgresql.trigger.5432) and restart Postgres on PG2. This makes sure that PG2 is running in standby mode.

  6. Start PG1. Since it does not have a recovery.conf file, it will start as primary.

  7. Start GO1 now, and ensure that the go.server.mode is either unset or set to primary.

  8. Assign the virtual IP to the GO1 box. See the details part of the “Setup a virtual IP for the agents to use” section for more information about this.

If this is done often, or even if not, it is recommended to automate this process (with a manual start). Since it involves a possible four different boxes, and communication between them is quite system-specific, this is not mentioned as a part of this setup. However, it can be done quite easily and is recommended.