From Exceptions to Patterns: Fixing the Work Beneath the Firefighting

RESTRAT Labs
Feb 16
14 min read

Updated: Apr 10

Firefighting at work - fixing repeated crises instead of preventing them - costs time, money, and efficiency. Why does this cycle persist? Organizations reward quick fixes over long-term solutions, track speed over prevention, and lack clear accountability for recurring problems. This behavior creates hidden costs, like overtime and rework, draining 8–15% of operational budgets and cutting productivity by 25–40%.

The solution? Shift from reacting to redesigning systems. By identifying recurring issues, assigning responsibility, and building feedback loops, businesses can eliminate chaos and reduce interruptions. Examples like Intermatic’s 90% on-time shipping improvement show how fixing systems - not just symptoms - leads to stability and better margins. If you’re constantly solving the same problems, it’s time to rethink your approach.

Why Exceptions Keep Repeating

Organizations often find themselves stuck in a cycle of constant firefighting. It’s not a lack of effort - it’s the way systems are set up. Quick fixes are prioritized over long-term solutions, metrics reward speed instead of prevention, and no one takes clear responsibility for fixing recurring issues. Let’s break down why this happens.

Quick Fixes vs. System Redesign

When problems arise, the instinct is to patch things up quickly - it’s faster and seems cheaper than overhauling the system. For example, a late vendor shipment or an incomplete customer order often leads to someone stepping in to resolve the issue manually. However, these "quick fixes" don’t address the root cause.

Over time, these workarounds turn into what experts call "hidden factories" - unofficial processes that compensate for flawed systems. A telling example comes from a BP refinery, where a third of purchase orders were delayed because of missing information. Employees had to create workarounds every single time, as the formal procurement system didn’t align with how work was actually done [5].

The cost of these patches often hides in plain sight. Expenses like overtime, expedited shipping, and rework quietly drain 8–15% of operational budgets [3]. For a mid-sized distributor earning $100 million annually, these hidden costs can add up to $3–5 million every year [3]. It’s a slow bleed that eats into margins without anyone noticing.

Real change happens when leaders focus on fixing the system, not blaming individuals. Take Intermatic, a manufacturer of industrial time switches. In the mid-1990s, their on-time shipments were below 60%. Senior VP Donald Kieffer shifted the focus to daily problem-solving sessions, tackling issues like inventory delays and paperwork bottlenecks. Within just four months, on-time shipments jumped to over 90% [5]. The work didn’t change - the system did.

But these quick fixes don’t just create hidden costs. They also reinforce a culture where metrics and accountability push teams further into reactive practices.

Metrics That Reward Speed, Not Prevention

Most organizations track how quickly problems are resolved, not how often they occur. This creates a strange dynamic: being good at firefighting makes you indispensable, even though it perpetuates the problem.

"One gets a good rating for fighting a fire. The result is visible; can be quantified. If you do it right the first time, you are invisible."W. Edwards Deming [1]

Deming’s insight explains why firefighting often feels like the "right" thing to do. It’s visible, dramatic, and appreciated. Preventing problems, on the other hand, is invisible - no one celebrates a crisis that never happened.

Consider Harley-Davidson in the early 1990s. Their product development process was described as "late, expensive, and wrong" (LEW). Experienced project managers were even called "firefighting arsonists" because their heroic efforts to resolve crises often ignored the root causes, setting the stage for future problems [2]. By introducing a Project Management Office (PMO) that conducted biweekly status reviews, Harley-Davidson shifted the focus from individual heroics to systemic improvements.

When metrics focus solely on maintaining service level objectives (SLOs), teams work overtime to patch flaws instead of fixing them [6]. The numbers look good, but the system remains fragile. Metrics that reward speed over prevention only reinforce this cycle.

No Clear Owner for System Repairs

Recurring issues persist when no one takes responsibility for fixing them at the system level. Ownership often falls to the person closest to the problem - usually someone without the authority or resources to make lasting changes.

"Because heroism masks systemic problems, the systemic problems are never fixed... The system is broken, and because the team doesn't realize that it's broken, the system never improves."Alexander Malmberg, Google SRE [6]

Without clear ownership, employees create their own shortcuts - spreadsheets, email threads, or other informal workarounds. These quick fixes solve immediate problems but prevent the organization from learning and implementing permanent solutions. The formal process remains untouched because no one is tasked with fixing it across the board.

This issue is even more pronounced in owner-led businesses, where the owner often becomes the system. They step in to resolve the same problems repeatedly - be it a vendor delay or a scheduling conflict. These issues feel like one-offs but are actually recurring patterns. When the owner becomes the go-to solution, the system itself never evolves.

sbb-itb-0d53bd9

Common Cause vs. Special Cause: Identifying Patterns

Exception Handling vs Pattern Design: Reactive vs Systemic Approach

Problems come in different forms - some are isolated incidents, while others point to deeper flaws in a system's design. Understanding this distinction is crucial. If you treat a systemic issue like a one-time event, you’re essentially guaranteeing it will happen again. Recognizing these patterns helps shift focus from quick fixes to addressing the root causes.

W. Edwards Deming made a clear distinction between two types of variation. Special cause variation refers to anomalies - specific, one-off events that lie outside the usual system and require targeted fixes. Common cause variation, on the other hand, is baked into the system itself. It’s a recurring issue that won’t go away unless the system is redesigned [1]. As Deming famously said:

"Putting out fires is not improvement of the process. Neither is discovery and removal of a special cause... This only puts the process back to where it should have been in the first place" [1].

Misidentifying systemic problems as isolated events often leads to superficial solutions. The core issue remains unresolved, and over time, these quick fixes pile up into a maze of workarounds. For example, at one BP refinery, 33% of purchase orders were delayed because the formal process didn’t provide the information employees needed. These delays weren’t random - they stemmed from structural flaws [5]. By distinguishing between isolated failures and systemic issues, leaders can move from constantly managing crises to improving the overall system.

Deming's Common vs. Special Cause Variation

Deming’s framework offers a practical way to decide where to focus efforts. If a process is stable but a problem keeps repeating, it’s likely a common cause issue. Treating each instance as a unique event only creates more instability [1]. For example, recurring weekly issues are rarely random - they’re symptoms of a flawed design. Without redesigning the system, these problems become the norm.

Take Intermatic, for instance. In the early 1990s, the company struggled to ship even 60% of orders on time. Instead of blaming individual failures, Senior VP Donald Kieffer mapped out systemic issues: inventory wasn’t where it needed to be, paperwork was incomplete, and customs delays were predictable but unaddressed. By redesigning workflows, the company improved on-time shipments to over 90% within four months [5]. The work didn’t get easier - the system became more efficient.

For small, owner-led businesses, the pattern can be harder to see because the owner often acts as the workaround. They step in to fix vendor delays, handle scheduling problems, or resolve customer complaints. While these interventions may feel necessary, if they happen every week, they’re not exceptions - they’re part of the system. In this case, the owner’s time becomes the hidden cost of an inefficient process.

Goldratt on Recurring Bottlenecks

Eliyahu Goldratt’s Theory of Constraints aligns with this thinking. A bottleneck that consistently reappears isn’t just a capacity issue - it’s a design flaw. When the same constraint keeps surfacing, the system is signaling where structural changes are needed, not just faster execution [4].

Consider Fannie Mae’s monthly book-closing process, which once took 13 days. Instead of pushing harder, the team mapped every handoff using index cards and string. This made bottlenecks visible: data wasn’t flowing smoothly between departments, and no one was accountable for transitions. By redesigning handoffs and holding daily huddles to address issues early, they cut the timeline to six days [5]. The problem wasn’t speed - it was a lack of clarity.

For smaller organizations, the equivalent might be a recurring scramble to finalize schedules or regular vendor delays that disrupt production. In many cases, recurring issues like these highlight the need for process redesign. If something happens predictably, it’s not a surprise - it’s a signal that the system needs to adapt.

Comparison: Exception Handling vs. Pattern Design

The difference between reacting to exceptions and designing for patterns lies in how work is approached, measured, and improved. The table below highlights the mindset shift needed to move from reactive problem-solving to proactive system design.

Feature	Exception Handling (Reactive)	Pattern Design (Systemic)
Focus	Immediate symptoms and crises [1]	Root causes and system structure [5]
Goal	Restore the system to its previous state [1]	Continuously improve the system [1]
Reward	Celebrates "heroes" and "firefighters" [1][6]	Encourages sustainable improvements [5]
Method	Quick fixes and patches [7]	Iterative redesign and experimentation [5]
Result	Creates hidden inefficiencies [4]	Frees up capacity and reduces stress [4]

Organizations that treat every problem as a special cause remain stuck in a cycle of firefighting. Those that identify and address recurring patterns gradually reduce chaos and unlock capacity without adding more resources. The potential was always there - it was just buried under inefficiencies and workarounds.

Designing Systems That Absorb Recurring Problems

When a problem is systemic rather than a one-off, the solution lies in creating systems that can handle recurring issues automatically. The aim is to move away from constantly reacting to problems and instead redesign the conditions that allow those problems to keep cropping up.

There are three key steps to building such systems: classifying exceptions, assigning ownership for systemic fixes, and establishing feedback loops that transform incidents into opportunities for improvement. Together, these steps create a framework for systems that adapt and improve over time.

Classify Exceptions to Trigger Redesign

Not all problems are created equal. Some are genuine one-time events - a supplier goes out of business, a critical employee leaves unexpectedly, or a natural disaster disrupts operations. Others only seem unique because no one has dug deep enough to identify the underlying pattern.

To separate recurring issues from isolated incidents, organizations need clear criteria. A helpful method comes from exception analysis in complex systems: classify problems by predictability, source, and severity [9][10]. For example:

Predictable errors, like repeated scheduling conflicts or missing details on purchase orders, signal flaws in the system rather than random bad luck.
External issues, such as vendor delays, often call for robust processes to handle them consistently.
Internal errors, like unclear handoffs or incomplete workflows, require better system design to prevent them altogether.

Setting thresholds for redesign is critical. For instance, if the same issue happens three times in a month, it’s no longer an exception - it’s a pattern that demands attention [8]. In one logistics operation, adopting an exception-first approach cut the time spent identifying problematic shipments by 62% [8].

For small businesses, recurring issues often show up as "missing time" in schedules or a growing backlog of unresolved tasks [4]. If the owner keeps stepping in to solve the same problem week after week, that’s a clear sign the system itself needs to be reworked.

Once exceptions are classified, the next step is assigning responsibility to ensure these patterns are addressed.

Assign Ownership for Pattern Fixes

Recurring problems don’t fix themselves. They need someone in charge of addressing the root cause. Unfortunately, firefighting tends to get rewarded because it’s urgent and visible, while system redesign often gets postponed because it’s less immediate and harder to measure.

To close the gap between identifying recurring issues and solving them, organizations must assign specific roles for system-level fixes. During a recurring issue, one person should focus on long-term solutions - documenting what went wrong, filing bug reports, and stabilizing the system after the immediate crisis is handled [11]. This role is distinct from the person managing the immediate fix.

A great example comes from Donald Kieffer, a Senior Vice President at Intermatic in 1996, who personally led his team to the factory floor to identify recurring issues in inventory and transportation. By addressing small problems daily, he helped stabilize the system and improve operations [5].

At a BP refinery, one-third of purchase orders were delayed due to missing information. Instead of overhauling the software immediately, the team implemented a simple email protocol to ensure all necessary details were included in requests. This reduced rework to under 10% [5].

For owner-led businesses, recurring issues often signal the need for systemic change. If the same vendor delay happens every week, the owner shouldn’t be the one making the call each time. The system should either account for delays, switch vendors, or renegotiate terms. The owner’s role is to fix the system - not to become part of the problem-solving process.

Build Feedback Loops for System Learning

Even with clear classification and ownership, recurring problems won’t disappear unless there’s a way to capture what’s happening and use it to improve the system. Feedback loops are essential for turning incidents into actionable insights.

One of the most effective tools is the daily huddle - a short meeting where teams highlight what’s stuck and decide on next steps [5]. Unlike status meetings that focus on past events, huddles identify real-time friction and assign accountability for resolving it.

"The key is to have rapid and routine feedback when workarounds occur, and to then respond accordingly." - John Carrier, Senior Lecturer in System Dynamics, MIT Sloan [4]

For example, Fannie Mae cut the time needed to close its monthly books from 13 days to 6 by mapping out data handoffs with a simple flowchart and using daily huddles to address bottlenecks as they arose [5].

In smaller operations, this might look like a weekly review of recurring issues, such as schedule changes or vendor delays. The goal isn’t to assign blame but to figure out which problems are repeating and determine if the system itself needs to change. If a problem shows up in three consecutive reviews, it’s time to redesign, not just react.

Feedback loops also depend on a culture where employees feel safe reporting issues immediately. If only good news gets shared, inefficiencies stay hidden. In the early 1960s, Armand Feigenbaum found that 30% of activities in a GE factory were unplanned, unscheduled, and unproductive [4]. These inefficiencies weren’t due to laziness - they were the result of a system that failed to learn from its own mistakes.

Shifting from reactive fixes to proactive prevention takes time. But with classification, ownership, and feedback loops in place, recurring problems can stop draining resources and start driving meaningful improvements.

SMB Studio Example: Stopping Recurring Chaos

By identifying recurring issues and redesigning systems, small and medium-sized businesses (SMBs) can tackle persistent challenges head-on. Here are two examples that demonstrate how this approach can address recurring chaos.

Vendor Delays That Repeat Weekly

A contractor in Central Texas was losing money on expedited freight and last-minute material substitutions due to recurring vendor delays. Initially, these delays were written off as isolated incidents - blamed on bad weather, equipment failures, or other one-off issues. However, when the team started tracking these exceptions, a clear pattern emerged: certain vendors were consistently late, and orders placed later in the week often failed to arrive in time for early-week projects.

The contractor shifted to a systems-based solution rather than relying on reactive fixes. They moved the ordering deadline earlier in the week and required vendors to confirm delivery windows sooner. Vendors unable to meet these new requirements were replaced, or additional lead times were factored in. Within weeks, on-time deliveries improved, and the need for costly expedited shipping diminished. This freed the owner from constant last-minute problem-solving.

This example highlights what’s often called a "hidden factory" - unplanned work that eats into profit margins [4]. The delays weren’t random; they were structural issues. Once identified, the system was redesigned to handle them without constant intervention.

Another recurring challenge for SMBs involves managing frequent schedule changes. Here's how one company tackled this issue.

Last-Minute Schedule Changes: Fixing the System

A residential remodeling company found itself stuck in a cycle of constant rescheduling. Initially, the owner blamed client indecision or unreliable subcontractors. But a deeper analysis revealed that most schedule changes stemmed from jobs running over time - usually because of incomplete material orders or delays in site preparation.

Recognizing this as a systemic issue, the company revamped its workflow. They introduced a process where the start of new jobs was tied to the confirmed completion of prior tasks. This approach is similar to methods used in healthcare operations, where linking task completion to subsequent steps minimizes delays and reduces costs [5]. Additionally, they implemented a pre-job checklist to ensure materials were ready and sites prepared before scheduling new work.

The results were immediate and noticeable. Schedule changes became far less frequent, and the owner spent significantly less time on reactive adjustments. Subcontractors appreciated the improved predictability, and clients enjoyed more reliable service. By redesigning the system to account for predictable exceptions, the company eliminated the need to repeatedly address the same issues.

"The workarounds are private, and there's no way for others to benefit from it. They solve the problem in the short run but not in the long run." - Nelson Repenning, Faculty Director, MIT Leadership Center [5]

These examples show how rethinking and reengineering processes can transform repeated firefighting into smoother, more predictable operations that protect margins and improve overall efficiency.

From Response to Prevention: Calm as a System Outcome

Converting Exceptions Into Predictable Patterns

Organizations that tackle recurring problems by embedding solutions into their operating model gain a clear edge. Take the case of Intermatic, a company manufacturing industrial time-switches. Back in 1996, their on-time shipment rate was stuck below 60%. Senior VP of Operations Donald Kieffer got the executive team involved in hands-on problem-solving. By streamlining inventory and paperwork processes, they managed to boost on-time shipments to over 90% in just four months [5]. This achievement didn’t just improve performance metrics - it also freed up resources for more strategic initiatives.

Another example comes from a BP refinery, where 33% of purchase orders required rework due to incomplete details. By creating a standardized handoff process and introducing quick huddles for handling nonstandard orders, the team reduced rework to under 10% [5]. These examples highlight how turning recurring exceptions into structured design inputs can eliminate constant firefighting. Instead of being bogged down by interruptions, staff could rely on a system that managed exceptions automatically, reducing manual intervention.

The difference is striking: firefighting only resets a flawed system [1]. True system improvement addresses the root cause, preventing the same issues from cropping up again. In an increasingly volatile world, organizations that transform exceptions into predictable patterns will consistently outperform those that rely on reactive speed and short-term fixes. And these operational changes? They don’t just stabilize the system - they lead to measurable financial benefits, as outlined below.

The Business Case: Margin, Capacity, and Stability

Shifting from reactive responses to proactive system redesign isn’t just operationally smart - it’s financially rewarding. Unplanned activities, often caused by recurring issues, can eat up as much as 30% of capacity and raise costs by 8–15%. Redesigning systems to eliminate these inefficiencies frees up resources. For instance, in healthcare, streamlining processes resulted in a 19% reduction in costs per surgical case [5].

For small and mid-sized businesses (SMBs), the impact is just as real. Costs associated with firefighting - like expedited shipping, overtime, and rework - can add up quickly. A mid-sized distributor generating $100 million in annual revenue could lose $3–5 million every year to these hidden costs, including lost productivity and employee turnover [3].

Reducing firefighting doesn’t just save money - it transforms the workplace. High-interruption environments can slash worker productivity by 25–40% compared to periods of focused work [3]. When systems are designed to handle recurring issues, employees can move from constant crisis management to making thoughtful, informed decisions. This shift improves service consistency, which is critical when 68% of customers switch suppliers because of poor service - not price [3].

FAQs

How can I tell if a problem is common-cause or special-cause?

A common-cause problem comes from the usual, expected variability within a stable system. On the other hand, a special-cause problem is tied to an unusual event that can be identified and traced. To tell them apart, consider whether the variation is part of the system's normal patterns (common cause) or linked to a specific, out-of-the-ordinary occurrence (special cause).

What threshold should trigger a system redesign?

When recurring issues highlight underlying problems rather than one-off incidents, it’s time to consider a system redesign. For instance, if the same type of problem keeps cropping up - say, multiple similar incidents within a short period - it’s a clear sign the system needs adjustment. To tackle this effectively, set clear thresholds to identify when action is needed, assign responsibility for managing these recurring challenges, and establish feedback loops. These loops can turn incident data into actionable insights, helping to shift focus from constant problem-solving to creating a more stable and proactive operational environment.

Who should own fixing recurring issues in an SMB?

In small and medium-sized businesses (SMBs), it's crucial for system or process owners to step up and take responsibility for resolving recurring issues. By doing so, they can tackle problems at their source, avoiding the need for constant, short-term fixes. The key here is to focus on redesigning systems to prevent these recurring issues altogether, rather than simply reacting to them as they arise.

RESTRAT CONSULTING

YOUR STRATEGY. DELIVERED.