Resilience After the First Shock: What Fails Second

RESTRAT Labs
1 day ago
12 min read

When a crisis hits, most organizations handle the immediate shock. But what happens next often causes more damage. Second-order failures - burnout, hidden backlogs, and communication breakdowns - surface after the initial emergency, quietly unraveling operations. These failures aren’t obvious at first but can cripple long-term recovery.

Key takeaways:

Exhaustion weakens leaders: Burned-out decision-makers slow recovery, creating delays and bottlenecks.
Deferred tasks pile up: Skipped maintenance, delayed invoicing, and informal fixes lead to operational debt.
Communication breaks down: Informal crisis habits disrupt normal processes, causing missed handoffs and errors.
Standards erode: Shortcuts taken during the crisis become the new norm, leaving the system fragile.

The solution? Organizations that prioritize recovery - through decompression periods, backlog resets, and decision reviews - are better equipped to handle future disruptions. Resilience isn’t about enduring; it’s about designing systems that prevent collapse after the first shock.

A Rapid Start Approach to Building Organizational Resilience

https://www.youtube.com/watch?v=JoPL9XzOTag

Where Systems Fail After the First Shock

Four Types of Second-Order Failures After Crisis Recovery

Organizations that make it through the initial crisis often stumble in the aftermath. These failures don’t happen during the emergency itself - they surface weeks later, when the crisis feels like it’s over. The breakdowns aren’t random; they’re structural and follow predictable patterns. This is why building resilience requires looking beyond just surviving the first shock.

Decision Maker Fatigue

When leaders are burned out, their decision-making suffers. After a crisis, the people who kept everything afloat are often completely drained. They’ve been working long hours, making rapid decisions under pressure, and absorbing enormous stress - usually without much support. As Beth Lay, Director of Human Performance at Calpine Corp., puts it:

"The more stressed people are, the more things go wrong... things can begin to fail, like a snowball rolling downhill, growing and gathering speed" [4].

Exhausted leaders become less responsive, miss deadlines, or grow short-tempered. Projects stall, schedules slip, and the same small group of people is left holding everything together while the rest of the organization waits for direction. This concentration of decision-making creates bottlenecks. What used to take hours now takes days, slowing recovery. Meanwhile, routine tasks pile up, creating a hidden backlog that worsens over time.

Hidden Backlog Buildup

During a crisis, teams often cut corners - delaying maintenance, skipping quality checks, and postponing invoicing - to keep things running. These deferred tasks don’t disappear; they form an invisible backlog that surfaces later as delays and bottlenecks. The "hidden factory" of workarounds and informal processes, which aren’t accounted for in official plans, can quietly consume significant resources.

Eventually, these deferred tasks overwhelm the system. For example, cash flow issues may emerge weeks after operations resume because invoices weren’t sent out in the chaos. Customer complaints may spike because skipped quality checks led to mistakes. Equipment might fail because routine maintenance was neglected. On the surface, the organization seems stable, but underneath, operational debt is piling up.

Communication and Handoff Breakdowns

As leaders burn out and backlogs grow, communication and coordination often break down. Processes that worked before the crisis may no longer function effectively afterward. Teams that relied on quick, informal communication during the emergency struggle to reestablish formal handoffs. Information gets lost between shifts, departments, or project phases, and responsibilities fall through the cracks as no one is clear on who’s in charge.

A real-world example of this comes from March 2000, when a fire at a Phillips plant in Albuquerque disrupted the supply of critical chips for Nokia and Ericsson. Nokia’s systems allowed it to quickly assess the damage and redesign components to work with alternative suppliers. Ericsson, however, struggled with communication gaps and slow responses, ultimately losing an estimated $300 million in market share [2]. The key difference wasn’t the initial shock - it was how each company handled coordination during recovery.

Standards That Erode Under Pressure

Under stress, quality controls and operational standards often take a backseat. During a crisis, survival becomes the priority - getting through the day, meeting deadlines, or fulfilling orders. Teams may skip inspections, leave documentation incomplete, or bypass approval processes to keep things moving.

Once these standards slip, it’s hard to bring them back. Shortcuts become habits, and those habits turn into the new normal. The organization might survive the first crisis, but it’s now operating with weaker safeguards, leaving it more vulnerable to the next disruption. As Benjamin Laker and Yelena Kalyuzhnova from Henley Business School explain:

"The more a team relies on heroic effort, the more fragile it becomes" [1].

Rebuilding these eroded standards is critical to creating recovery systems that can withstand future shocks. True resilience isn’t just about weathering the storm - it’s about designing systems that prevent these predictable failures from taking root in the first place.

Why Recovery Fails

Recovery efforts often fall short, not because organizations lack dedication, but because their systems aren't built to handle the pressures that follow an initial crisis. The underlying issues are deeply rooted in structural flaws, as studies over the years have shown. Key factors like fatigue, cascading deviations, and excessive reliance on individual effort all play a role in undermining recovery.

Fatigue and Deferred Decisions

Many organizations confuse endurance with resilience. Endurance is the ability of a team to push through under relentless pressure - working overtime, skipping breaks, and absorbing stress to keep things afloat. Resilience, however, is about designing systems that prevent people from reaching their breaking point. As Benjamin Laker and Yelena Kalyuzhnova from Henley Business School explain:

"Endurance is about surviving pressure. Resilience is about designing systems so people don't break under it" [1].

When organizations lean too heavily on endurance, they create a hidden crisis. Teams become overworked, critical decisions get delayed, and operations grind to a halt. This approach shifts the burden of stress onto individuals, and when they inevitably hit their limits, the entire system falters.

Cascading Failure and Process Variation

Fatigue isn’t the only issue - systemic weaknesses can amplify risks through process variations. Charles Perrow’s research on "normal accidents" revealed that tightly interconnected systems are particularly prone to cascading failures, where one problem triggers a chain reaction [2]. W. Edwards Deming also warned about the dangers of process variation. When teams stray from established procedures, even for valid reasons, they introduce small, untracked risks that build up over time.

These deviations, much like hidden backlogs or delayed communications, can quietly consume up to 30% of resources and increase risk [2]. Because they often go unnoticed by management, they aren’t monitored, tested, or improved. Toyota tackles this problem by analyzing and addressing workarounds after they happen - either by incorporating them into standard processes or eliminating them entirely [2]. Unfortunately, many organizations allow these deviations to become routine, weakening their systems and making them more vulnerable to future disruptions.

Fragility vs. Antifragility

Nassim Taleb categorizes systems into three types: fragile (those that break under stress), robust (those that resist pressure), and antifragile (those that grow stronger from stress). The difference lies in more than just surviving the initial shock - it’s about whether the system can handle the next one.

A common mistake is equating flexibility with resilience. While organizations often celebrate their ability to adapt quickly, this agility can come at a cost. Without proper infrastructure - such as adequate budgets, staffing, or decision-making authority - flexibility becomes a weakness. As Laker and Kalyuzhnova note:

"Flexibility without infrastructure is fragility" [1].

For example, a production team that quickly shifts priorities may seem adaptable, but without sufficient resources to sustain these changes, the system becomes unstable. Teams that survive an initial crisis through sheer determination often collapse under the weight of deferred maintenance and declining quality when the next challenge arises. Building antifragile systems requires intentional buffers - extra time, budget, or capacity - to absorb shocks without overburdening individuals.

Recovery ultimately succeeds when organizations stop treating resilience as a personal trait and start designing systems that protect their people. The goal is to build operations that can withstand future shocks without relying on unsustainable efforts from individuals. This shift from reactive endurance to proactive system design is what separates lasting recovery from failure.

Designing Recovery Into Operations

Recovery isn’t something that just happens - it’s something organizations design into their operations. Businesses that maintain stability after a disruption don’t rely on luck; they build systems to help teams reset, manage workloads, and restore efficiency before the next challenge hits. These systems transform recovery from a reactive scramble into a structured process that supports long-term resilience.

Decompression Cycles

Most organizations plan for work, but few plan for recovery. That’s where decompression cycles come in - scheduled breaks that give teams a chance to regroup after major launches, seasonal rushes, or unexpected disruptions. As Benjamin Laker, Professor of Leadership at Henley Business School, puts it:

"Resilience is not about how long people can keep sprinting. It's about how intelligently leaders design the course" [1].

This distinction between endurance and resilience is key. Endurance pushes people to keep going; resilience builds in time to pause and recover. After intense periods like product launches or crisis responses, teams need space to catch up on delayed tasks, clear backlogs, and recharge. Leaders who set boundaries and prioritize recovery send a clear message: recovery matters as much as execution [1][5].

During high-stress times, it’s also smart to delay non-essential tasks. Routine paperwork, low-priority reports, and other administrative work can wait, allowing teams to focus on what’s critical [4]. Keeping an eye out for warning signs - like missed deadlines, increased tension, or stalled progress - helps leaders step in before problems escalate [4].

By weaving recovery periods into the workflow, organizations can systematically tackle accumulated work instead of letting it pile up.

Backlog Reset Mechanisms

Backlog reset mechanisms are tools for identifying and clearing deferred tasks so they don’t overwhelm the system. A study at a GE plant in the 1960s found that about 30% of work was tied up in the "hidden factory" - unofficial workarounds that weren’t part of formal processes [2]. These hidden tasks drain resources and increase risk.

The first step is visibility. Teams need to track not just active projects but also deferred maintenance, delayed communications, and informal workarounds. Once identified, these backlogs can be addressed - either by integrating the workaround into standard procedures or eliminating it entirely. Toyota’s approach to disruptions serves as a great example: they redesign processes to ensure that temporary fixes don’t become recurring problems [2].

Another effective strategy is cross-training employees and rotating responsibilities. This spreads the workload, preventing a few individuals from bearing all the strain and reducing the risk of backlogs spiraling out of control [1]. By distributing responsibilities, organizations can clear accumulated work without slipping into constant crisis mode.

Decision Review Windows

Decisions made in the heat of a crisis often make sense at the time but may not work long-term. Decision review windows - typically scheduled one to two weeks after a crisis stabilizes - give leadership a chance to revisit those choices. This is the time to decide which actions should be reversed, refined, or made permanent.

Crisis decisions often involve decentralizing authority, bypassing approvals, or shifting priorities. While some of these changes can improve systems, others might introduce new risks. A structured review helps separate what’s useful from what’s not, updating processes to avoid turning temporary fixes into permanent weak points.

After Action Reviews (AARs) play a crucial role here. Once the immediate crisis is over, teams can reflect on what worked, what didn’t, and what needs adjustment. These reviews aren’t about assigning blame - they’re about learning and updating the organization’s collective knowledge [4].

This careful evaluation of crisis decisions ensures that the organization is better prepared for future challenges.

Restoring Standards Before Growth

After surviving a crisis, the instinct might be to push forward quickly, but the smarter move is to stabilize first. Restoring standards before growth means resetting quality levels, communication protocols, and operational processes before scaling up again.

Skipping this step leaves organizations vulnerable. Inefficient shortcuts can linger, leading to lower quality, higher rework rates, and customer dissatisfaction down the line. Restoring standards involves re-establishing quality checks, clearing communication backlogs, updating documentation, and addressing informal processes. This isn’t about being rigid - it’s about ensuring the system is strong enough to handle the next disruption without starting from a weakened position.

Organizations that bake recovery into their operations don’t just survive - they’re ready to face future challenges head-on. Recovery becomes a built-in strength, not just a hopeful outcome.

Resilience in Small Business Operations

Building operations that absorb disruption is a strategy used by big businesses that can apply to small businesses too, but the challenges take on a different shape. Small businesses often lack the buffer of multiple management layers, meaning when something goes wrong, it directly impacts the owner, the team, and the customers. What’s trickier is that after the initial problem is resolved, second-order failures - those that creep up later - can still cause significant damage. These hidden vulnerabilities can unravel operations just as much as the initial disruption.

Surviving the Rush, Failing Afterward

Small businesses may handle busy periods well but often struggle afterward. Imagine a residential remodeling company juggling three kitchen renovations during a six-week crunch. They meet deadlines and keep clients happy, but once the rush is over, problems begin to surface. Invoices are delayed, cash flow tightens, and shortcuts taken under pressure lead to rework. Follow-ups and quality checks get postponed, and the team finds itself drained. These hidden workarounds, created to survive the rush, can quietly sap energy and resources long after the busy period ends, leaving the business vulnerable [2].

Owner Burnout and Operational Overload

Sometimes, small businesses mistake sheer endurance for resilience. As Benjamin Laker and Yelena Kalyuzhnova from Henley Business School put it:

"Flexibility without infrastructure is fragility" [1].

If the business relies on the owner or a few key team members to absorb every challenge, it’s not resilient - it’s fragile. A good way to assess this is by asking:

"Did our systems protect people, or did people protect the system?" [1]

If the answer leans toward people carrying the load, the business is running on borrowed time, not sustainable practices.

Scheduling and Workflow Stabilization

Recovering after a busy period takes intentional planning. Small businesses can adopt decompression cycles - scheduled recovery times after major projects - to give the team a chance to reset. Treating rest as essential, not optional, helps ensure long-term performance. Cross-training employees and rotating roles can also reduce the risk of over-reliance on any one person.

Another step is addressing the “hidden factory” of makeshift processes that pop up during crunch times. These shortcuts - like skipped steps or informal communication - should either be formalized into workflows or eliminated to return to standard procedures [2]. Additionally, maintaining strategic slack - a small reserve of time, budget, or resources - can help the business absorb shocks without spiraling into chaos. This isn’t wasted effort; it’s the backbone of resilience [1]. By embedding these practices, small businesses can shift from merely surviving disruptions to maintaining steady, reliable operations.

Resilience as Durability

Resilience isn’t something you fix once and forget about. It’s not a quick patch for emergencies. Organizations that thrive in unpredictable conditions treat resilience as an operational capability - a core part of how they function day-to-day. As Benjamin Laker from Henley Business School explains:

"Resilience is not a personal trait to cultivate - it's a system to construct." [5]

This mindset shift is crucial. When resilience is baked into everyday operations, decision-making, and recovery processes, the organization itself absorbs pressure, sparing individuals from bearing the brunt. Instead of relying on extraordinary efforts during every crisis, the system is designed to handle strain effectively. This approach lays the foundation for sustainable recovery that extends well beyond the initial shock.

Recovering Beyond the First Shock

The real difference between just getting through and truly enduring lies in what happens after the initial disruption. Take the Nokia versus Ericsson example: it wasn’t the immediate response but the recovery design that shaped their long-term outcomes.

Organizations that excel at recovery don’t merely return to their previous state - they adapt, improve, and recalibrate. They incorporate decompression periods after intense efforts, maintain strategic slack (those small reserves of time, budget, or capacity), and reward early problem detection rather than last-minute firefighting [1]. Effective recovery doesn’t happen by chance; it requires deliberate planning.

Future Outlook: Organizations Built to Last

By prioritizing robust recovery strategies, businesses can create a foundation for resilience that grows stronger over time. In today’s world, where volatility is constant, the organizations that succeed are those that design for resilience that compounds. Interestingly, 30% of global executives now embrace what’s called "Organizational Offense" - viewing disruption as an opportunity to innovate, break into new markets, and elevate performance [3]. This isn’t about reacting faster; it’s about building systems that thrive under pressure and emerge stronger after each challenge.

Whether it’s a small business or a large enterprise, the principle remains the same: recovery must be woven into the operating model. The goal isn’t to sprint harder every time conditions change - it’s to create a system where sprinting isn’t the default response. The organizations that endure are those that design for resilience to shine after the rush, ensuring they’re prepared for whatever comes next.

FAQs

How do I spot second-order failures early?

To spot second-order failures early, pay attention to subtle indicators such as overworked decision-makers, postponed maintenance tasks, growing yet unspoken backlogs, and breakdowns in communication. These warning signs often appear after the initial disruption has passed. Employing strategies like regular review sessions, decompression periods, and backlog resets can bring hidden problems to light before they grow into larger issues. Adopting a systems-thinking perspective - looking at how components interact rather than just focusing on immediate results - is crucial for identifying these failures early and ensuring stability.

What should a “backlog reset” include in practice?

A "backlog reset" is a systematic approach to tackling delayed tasks after a disruption. It starts with reviewing all pending work, re-prioritizing tasks to align with current priorities, and addressing the most critical items first while postponing less urgent ones. The goal is to reduce the backlog, restore quality standards, and handle essential maintenance before returning to regular workflows. This process also includes reflection and recalibration, helping teams identify and address root causes to avoid repeated setbacks and ensure a smoother recovery.

How can a small business add recovery time without losing revenue?

Small businesses can integrate recovery time into their operations by creating systems with planned decompression periods and backlog resets. These practices help avoid burnout, overlooked maintenance, and hidden backlogs that could lead to unexpected breakdowns. Strategies like recovery cycles, periodic decision reviews, and reestablishing standards before scaling up ensure stability. By tackling potential problems early, businesses can maintain resilience and safeguard revenue, minimizing disruptions after peak activity periods.

RESTRAT CONSULTING

YOUR STRATEGY. DELIVERED.