Four Strategies For Defending Mission Critical Networks
In this series, Craig Mathias explains why resilience is essential for networks to operate when operational systems are compromised.
Last time we discussed why eliminating single point-of-failure is critical to network resilience. And the message overall in this series has been: Given the number of possibilities for things to go wrong in IT operations, designing and implementing resilient solutions is essential to success. No network – no productivity.
Fortunately, though, building resilience in is getting easier every day, even as the threats we discussed in earlier installments persist – and they will, perhaps forever, so it’s always time to think resilience.
Here are four key strategies for assuring resilient IT operations:
1) Conduct a resilience review – This isn’t necessarily an easy process, since many potential unknowns are involved. But getting a handle on just how resilient a given IT installation is, is nonetheless essential.
The first step is to examine how information flows within the organization, and to harden any critical links. Consider all operational procedures, examine costs, and explore new technologies. After-hours exercises, such as actually shutting down key IT elements, can be very revealing.
The most important element here, though, is to identify any potential single points of failure, elements that would lead to a service outage if disrupted. Which leads us to…
2) Eliminate all single points of failure – The classic techniques for accomplishing this have included fault-tolerant hardware and software, overprovisioning, backup systems (hot, warm, and cold), and redundant network links, even on the LAN.
But note that wireless LANs offer one of the best solutions here – a failure in an access point results in clients simply associating with another AP, usually quickly and with no disruption obvious to the user. Overprovisioning APs in the interest of adding additional capacity also consequently serves to improve resilience. Of course, controller-based architectures add another potential single point of failure, necessitating at a minimum one additional redundant controller. And Ethernet switches can be an issue unless APs are connected to multiple switches or interleaved geographically across switches. As for the network core, check with your vendor, but in many cases redundant routers and redundant backhaul connections can address most potential challenges here.
3) Make sure required policies are in place and communicated pervasively – Many threats to integrity take the form of malware, often inadvertently introduced by end-users. Make sure security, acceptable use, and BYOD policies reflect the serious nature of connecting to the organizational IT infrastructure. (I recently wrote extensively about BYOD policies in this series)
While classical malware-mitigation techniques remain important, behavior modification can also go a long way. Training materials and regular reinforcement play a key role here – complacency is clearly the enemy of integrity, and resilience is never an exact science.
4) Think – and act – Cloud – Finally, an important development that’s re-shaping IT overall will play a key role in the future of resilience: the Cloud. Cloud-based services can be very cost-effective and easily scalable, and are already hard at work in storage, processing, management services, and more.
I expect this trend to continue with many organizations moving the vast majority of their IT infrastructure to Cloud-based services over the next decade.
Two cautions with respect to resilience here:
1) First discuss resilience requirements with your Cloud suppliers. Most have already considered the advantages and have good stories to tell. But, even so, having at least two Cloud suppliers provide backup in the event of a catastrophe is essential. So, then: What might a local IT infrastructure look like in the near future? Access points, switches, routers, and backhaul – with numerous obvious benefits, enhanced resilience being just one.
2) And, finally, don’t forget analytics – expect Cloud-based services to spot potential integrity problems before human operators ever could.
Even with all of the above, there is a cautionary bottom line here – when it comes to resilience, which can be thought of, after all, as a branch of security, you’re never, ever “done”. Every upgrade, improvement, enhancement and plain old change is likely to have an impact on resilience.
And, for that reason, resilience belongs as, and will remain, a consideration in every IT strategy and plan – forever.
All posts in this series: