Failover for Monitoring

You use server and network monitoring software like FrameFlow to make sure your critical systems are up and running but what happens if your monitoring system is down for maintenance or system upgrades?

FrameFlow’s failover options are designed to handle situations like this elegantly and effortlessly. Let’s take a closer look.

Single Site Mode

In single site mode your FrameFlow installation monitors all of your local systems, plus all of your public facing resources such as web sites and FTP servers. To deploy failover monitoring simply install a clean copy of FrameFlow on a second system somewhere in your network. Then go to the Failover settings page, on the new installation, and give it the path to your main installation plus credentials so it can connect.

After a few seconds, the secondary system will pull down your complete monitoring configuration and then show a banner indicating that it has gone into a dormant mode. While dormant, it will regularly check that the primary system is active and synchronize the configuration so it picks up any changes you’ve made on the primary.

Normal operations. Primary handles all monitoring and secondary is dormant.
Normal operations. Primary handles all monitoring and secondary is dormant.

If the secondary detects that the primary is down, it will automatically take over all monitoring actions. When it detects that the primary has become active again, it will go back to its dormant mode and keep synchronizing.

Primary is down. Secondary becomes active and takes over monitoring.
Primary is down. Secondary becomes active and takes over monitoring.

As you can see it all happens automatically without any actions required by you or your staff. The primary can go down in the middle of the night and FrameFlow will make sure your monitoring marches on as usual.

Multi-Site Mode

In multi-site mode things get more interesting. With our multi-site version you have a master console which is used for dashboards, configuration changes and reporting.

You also have one or more remote nodes that do the monitoring at remote sites and call home to the master with the results. More components means more potential points of failure, but FrameFlow has you covered for all cases.

Master Console Failover

The first question is how to handle the case where the master console is down. For this you install a second clean master console on another system. It doesn’t have to be at the same location as your primary. It could be in a different data center in a different country or pretty much anywhere as long as your remote nodes can reach it. You use the failover over options in the settings to give it the path to the primary master console. The master tells each remote node about the secondary.

Normal operations. Primary master communications with remote nodes.
Normal operations. Primary master communications with remote nodes.

From that point onward the remote nodes will call home to the master, but if it can’t be contacted they will try to connect to the secondary.

Primary is down. Remote nodes call home to the secondary.
Primary is down. Remote nodes call home to the secondary.

When the primary comes back online the remote nodes switch back to it and normal operations continue.

Remote Node Failover

The second question is what to do if a remote node goes down? The master console will detect that it hasn’t heard from the node and will send alerts, but FrameFlow also offers a failover option for remote nodes.

failover5

To activate a failover remote node, run the setup program on a second system at the remote site. During the configuration phase select the option to install as a failover node. This secondary node will synchronize with the master, but it will be dormant.

failover6

If the primary remote node goes down for any reason, the secondary will take over the monitoring at the site until the primary comes back online.

Dual Failover

The failover options for the master console and the remote nodes can be active at the same time, giving you multi-point failover to ensure your monitoring is always up and running.