Security System Availability: Managing Risk
As the Building Internet of Things (IoT) gains momentum, organizations across a spectrum of industries are increasing investment in automation systems for building security due to heightened awareness of both physical and cyber threats. Yet these systems are only effective as long as the servers that support them are up and running. In some cases, these are standalone servers supporting a single security system, collecting dust in the back of an equipment closet somewhere. In other cases, organizations have invested in updating their technology, deploying virtualized servers that efficiently support a range of security systems (or other building systems). Both approaches present a potentially hidden risk, especially as companies look to migrate to the IoT.
Take the example of an access control system that relies on a dedicated server that may date back to the previous decade or even previous century. Out of sight, out of mind — until the day the server reaches its inevitable end of life. Access may be temporarily denied. Or even more critical, the database that supports the card readers at the doors may be lost or corrupted, creating a lapse in security as well as the time-consuming task of attempting to rebuild the database manually.
Ironically, virtualization may actually compound this risk. By consolidating multiple applications on a single server, virtualization can result in a single point of failure that impacts a range of critically important security systems. Imagine losing surveillance, alarm, and access control systems in a single downtime event for a single virtualized server. How will you know there is an emergency without a working alarm system or video cameras? How will you open automated doors in the event of a fire? No security professional or facility manager wants to have to answer those questions.
These risks may be magnified in buildings where there is no on-site IT staff available to deal with such a server failure rapidly in an emergency.
As building automation and security systems become increasingly reliant on server technology, ensuring the availability — or uptime — of the applications running on those servers is absolutely critical. But how much availability is “good enough” in an IoT world? And what’s the best way to achieve that level of availability?
To answer those questions, it’s important to understand the three basic approaches to server availability:
1. Data backups and restores: Having basic backup, data-replication, and failover procedures in place is perhaps the most basic approach to server availability. This will help speed the restoration of an application and help preserve its data following a server failure. However, if backups are only occurring daily, significant amounts of data may be lost. At best, this approach delivers approximately 99 percent availability.
That sounds pretty good, but consider that it equates to an average of 87.5 hours of downtime per year — or more than 90 minutes of unplanned downtime per week. That clearly falls short of the uptime requirements for building security and life-safety applications.
2. High availability (HA): HA includes both hardware-based and software-based approaches to reducing downtime. HA clusters are systems combining two or more servers running with an identical configuration, using software to keep application data synchronized on all servers. When one fails, another server in the cluster takes over, ideally with little or no disruption. However, HA clusters can be complex to deploy and manage. And you will need to license software on all cluster servers.
HA software, on the other hand, is designed to detect evolving problems proactively and prevent downtime. It uses predictive analytics to automatically identify, report, and handle faults before they cause an outage. The continuous monitoring that this software offers is an advantage over the cluster approach, which only responds after a failure has occurred. Moreover, as a software-based solution, it runs on low-cost commodity hardware.
HA generally provides from 99.95 percent to 99.99 percent (or “four nines”) uptime. On average, that means from 52 minutes to 4.5 hours of downtime per year — significantly better than basic backup strategies.
3. Continuous availability (CA): Also called an “always-on” solution, CA’s goal is to reduce downtime to its lowest practical level. Again, this may be achieved either through sophisticated software or through specialized servers.
With a software approach, each application lives on two virtual machines with all data mirrored in real time. If one machine fails, the applications continue to run on the other machine with no interruption or data loss. If a single component fails, a healthy component from the second system takes over automatically.
CA software can also facilitate disaster recovery with multi-site capabilities. If, for example, one server is destroyed by fire or sprinklers, the machine at the other location will take over seamlessly. This software-based approach prevents data loss, is simple to configure and manage, requires no special IT skills, and delivers upwards of 99.999 percent availability (about one minute of downtime a year) — all on standard hardware.
CA server systems rely on specialized servers purpose-built to prevent failures from happening and integrate hardware, software, and services for simplified management. They feature both redundant components and error-detection software running in a virtualized environment. This approach also delivers “five nines” availability, though the specialized hardware required does push up the capital cost.
Which of these three general approaches is needed for your building security applications will depend on a range of factors.
How Vulnerable Are You?
First, it’s important to determine the state of your current security automation infrastructure. While your system architecture may be billed as “high availability,” this term is often used to describe a wide range of failover strategies — some more fault-tolerant than others. In the event of a server failure, will there be a lapse in security? Can critical data be lost? Is failover automatic, or does it require intervention?
How Much Is Enough?
So how much availability do you need? Obviously, deploying the highest level of continuous availability (CA) for all of your security applications across the enterprise would be ideal. But the cost of such a strategy could be prohibitive. Moreover, not all security applications require the highest level of uptime.
For example, some applications may be deployed in a multi-tiered approach. With this arrangement, there could be a “master server” in a centralized location controlling a network of site servers, which regularly cache data back to the master server. In this scenario, you might configure the master server as CA but decide that HA is adequate for the site servers, given their workloads. It all depends on the criticality of each server’s function within the security automation architecture.
Carefully assessing your requirements for each security application and planning your infrastructure to provide the appropriate level of availability is the key to balancing your real-world needs with the realities of your budget.
Need for Visibility
The other limiting factor when planning your security automation infrastructure for the IoT is skilled staff to oversee your environment. As the proliferation of virtual servers expands to encompass hundreds or even thousands of security devices and resources, the management task quickly becomes overwhelming.
To stay on top of it all, IT professionals can benefits from a robust, centralized capability to monitor their entire environment. This should include not only security systems but other critical systems, such as HVAC, power, and communications. They also benefit from being alerted to developing problems so they can move proactively to head off a potential outage.
Planning Your Strategy
As with most security-related issues, planning ahead is critical. Consider the following tips:
Think about server availability as a core requirement, right up front. Planning early can help you avoid the problems that crop up when trying to “tack on” an availability solution later in the architecture and deployment cycle.
Carefully assess the availability requirements of all security applications and determine how much downtime you can afford for each. This will help guide you to the appropriate availability solution needed for each application.
Be wary of non-virtualized cluster systems that require lots of interactions between the security application and the cluster software. Solutions that minimize intrusion into the application space are more flexible and easier to manage.
Consider building automation vendors who are familiar with availability, and who have the knowledge to guide you to solutions that are suitable for your deployment.
Originally published in LinkedIn Pulse.