ao link
Business Reporter
Business Reporter
Business Reporter
Search Business Report
My Account
Remember Login
My Account
Remember Login

IT resilience at scale needs to be more than surviving

Terry Storrar at Leaseweb UK explores building resilience at scale and the actions that business leaders should be taking to implement this in practice

Linked InXFacebook

The century-old global Scout movement motto, “Be prepared”, is just as applicable to today’s digital businesses as a prompt to look at resilience planning and strategy.  Failure to prepare and to ask the right questions about business impact means that organisations are leaving their operations unnecessarily open to incidents that could have serious and costly repercussions.

 

Research published last year found that the 1000 organisations surveyed experienced 86 outages on average per year.  Worryingly, the same report flagged that although 95% of executives were aware of current operational vulnerabilities in their businesses, nearly half admitted they had yet to take action to address these weak spots.   

 

Another report focusing on UK businesses highlighted how 60% had struggled to return to normal operations from the disruption of a major outage.  A warning here was that many UK businesses assume they are more resilient than they are in reality.

 

For any business, assessing risk exposure and ensuring that recovery strategies are effective enough to handle technology outages and recover quickly is challenging, especially at scale.  It presents a strong case for why organisations should take their resilience design one step further, with proactive resilience used to anticipate and prevent failures before they happen.

 

 

Changing resilience mindset

Despite widespread discussion about what effective resilience means in practice, there is still a tendency to default to security as the primary cause of outages.  Too often, other reasons are overlooked, so starting out with a proactive stance on resilience is fundamentally flawed from the outset unless there is a shift away from this.

 

Some of the most notable, recent technology resilience incidents are caused by infrastructure faults or failure, software problems, misconfiguration or dependencies on third parties. The Microsoft 365 outage in January 2026 is an example – this caused widespread disruption to global business and government communications for several hours.  Rather than security, the outage stemmed from part of the service infrastructure in North America not processing traffic as expected.

 

The cloud-based infrastructures so typical of the modern digital economy is making problems like this more common, with the risks harder to control and more likely to have a wider impact. In addition, with global AI implementation, scaling resilience to cover all internal and external dependencies is more problematic than ever.

 

 

Resilience as a key IT design principle

With outages inevitable, simply fighting resilience fires is not sustainable or cost effective in the longer term.  Any company trying to guarantee against any failure needs to look at what resilience means through a different lens, as 100% uptime is not achievable.

 

A positive first step towards proactive resilience is investing in maintaining operations as and when a failure happens.  Today’s digital businesses require a clear view across the whole technology environment, however this is structured, so they can anticipate types of resilience and recovery scenarios, then implement business-specific plans to address these.  Ideally, this should include all layers, systems and resources for in-house departments through to third-party service providers.   

 

With multi-layer architectures to contend with, without the right insight across all components, there is the risk that an outage could seem resolved in one area while continuing to affect other systems and users. Good resilience design is also likely to provide for workflows that can function under a partial failure, rather than reliance on 100% uptime.

 

In practice, this means building systems that can operate in a degraded state, rather than falling over completely. At scale, resilience is therefore less about avoiding disruption and more about maintaining continuity despite it.

 

 

Why proactive resilience goes beyond

True proactive resilience needs a mindset shift.  It’s the opposite of reacting in the event of an outage and starts with defining what the successful outcomes are for a specific business when there is an incident.  This process works backwards to then implement a design to fulfil this goal.

 

It is inevitable that this requires more investment upfront, and the cost and complexity will vary depending on the desired end goals.  So, a business heavily reliant on data, like a bank, will have to spend more on proactive resilience measures than a business that relies less on data and uptime.  There are simply different priorities and assets to protect, and it is key to use scenario planning tools, trend analysis and stress testing to ensure that resilience capacity is tailored to business needs.  This should include scenarios that cater for business continuity for different lengths of outage disruption.

 

Organisations adopting a truly proactive approach should also ensure that there are documented, regularly rehearsed incident response processes and mechanisms in place to encourage continuous learning and adaptation.  With such tools as feedback loops, an organisation is constantly adapting to the latest outage scenarios rather than an incident forcing change under time pressure.

 

 

Why data sovereignty is part of the plan

Data sovereignty, the requirement to store, process and govern data under the country laws in which it resides, is also an essential component of resilience strategy planning.  This is because organisations need to show the physical location of data, but also protocols on who can access this and from where.   

 

Should data be stored with service providers governed by external jurisdictions, as is the case with data stored with large American cloud providers, there are clear resilience and recovery implications to plan for in the event of an outage.  This includes additional regulatory controls and cross-border transfer limitations, which add another layer of complexity to any resilience planning.

 

No matter what the cause of an outage, it is fundamental that businesses need to weather disruptions and continue business operations under a wide range of conditions.  With many businesses addressing the symptoms, proactive resilience is a more comprehensive approach to change underlying issues and goes the extra mile towards a business being as prepared as possible to face disruption head on. 

 


 

Terry Storrar is managing director of Leaseweb UK

 

Main image courtesy of iStockPhoto.com and ArtemisDiana

Linked InXFacebook
Business Reporter

Winston House, 3rd Floor, Units 306-309, 2-4 Dollis Park, London, N3 1HF

23-29 Hendon Lane, London, N3 1RT

020 8349 4363

© 2025, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543