Chapter 21 Common Pitfalls to Avoid

While SPC is a powerful framework for data-driven quality improvement, its effectiveness can be undermined by a range of common missteps. Misinterpretation of data, misuse of run and control charts, and faulty assumptions can lead to wasted effort or missed opportunities, and may ultimately erode trust in SPC itself.

This chapter outlines several common pitfalls to avoid.

21.1 Data issues

SPC charts are only as good as the data behind them. Poor data quality, including inaccurate measurements, inconsistent definitions or data collected under conditions that vary over time or between locations can significantly compromise their reliability.

How the subgroups are formed is especially important. Subgroups consist of the data elements included in each data point. The way data are sampled and grouped can greatly affect their ability to distinguish common cause variation from special cause variation. If subgrouping is not done properly, the SPC chart may generate false signals or obscure important ones. We have devoted a full chapter (22) to rational subgrouping.

Time spend developing operational indicator definition and rational subgroups often yields significant returns. Investing effort upfront ensures that data are meaningful and comparable, enabling better decision-making.

21.2 Signal fatigue from over-sensitive SPC rules

Originally, control charts employed only one rule, the 3-sigma rule. However, to increase the chart’s sensitivity to minor sustained shifts and other patterns of non-random variation, a plethora of supplementary rules have since been developed. To name a few, we have the Western Electric rules (four basic rules plus many more for specialised purposes), the Nelson rules (ten rules), and the Westgard rules (six rules). Most rules are based on identifying non-random patterns in the position of data points relative to the one-, two-, or three-sigma limits, each with different diagnostic properties.

Furthermore, several distinct sets of rules for runs analysis based on identifying non-random patterns relative the the centre line have been published (Anhøj 2015).

As discussed earlier in this book, it may seem tempting to apply as many rules as possible to increase the chance of detecting any signs of special cause variation. However, this strategy comes at a price. While additional rules do increase the sensitivity of SPC analysis, they also raise the risk of false alarms. This topic is further explored in Appendix C, Two Types of Errors When Using SPC.

Apart from the risk of applying too many rules, the rules themselves may also be either overly sensitive or insufficiently sensitive. A prominent example is a commonly used set of runs rules – including four tests for trends, shifts, and too many or too few runs respectively – adopted by many healthcare organisations (Provost and Murray 2011; Perla, Provost, and Murray 2011). This particular set has been shown to produce a very high rate of false signals. For instance, in a run chart with 24 data points, the expected false positive rate is close to 50% (Anhøj 2015). The main contributor appears to be the trend rule – defined as five or more consecutive data points all increasing or decreasing. Short trends like this are very common, even in sequences of purely random numbers. Moreover, trend rules in general have been thoroughly studied and found to be, at best, unhelpful and, at worst, misleading (Davis and Woodall 1988).

To make matters worse, some practitioners deliberately tighten control limits in an attempt to increase the sensitivity of the sigma rule. This behaviour often stems from another common mistake: using the overall standard deviation to calculate control limits (Wheeler 2010, 2016). As emphasised repeatedly throughout this book, the overall variation is not appropriate for this purpose, as it includes both common cause and special cause variation – ultimately inflating the control limits. Inflated limits can, in turn, prompt misguided attempts to narrow them artificially. Instead, control limits, set at ±3-sigma from the centre line, should be based on the estimated within-subgroup standard deviation from rationally formed subgroups. This approach captures common cause variation and provides a sound basis for meaningful process monitoring and improvement.

To avoid signal fatigue – which, at best, results in wasted effort from chasing false signals – we recommend that practitioners select their rules carefully, focusing on capturing patterns that are most likely to signal special cause variation in their specific context. The three rules suggested in this book – the two runs rules and the 3-sigma rule – have been deliberately designed to detect patterns indicative of one or more of the types of variation most commonly encountered in healthcare improvement: freaks, shifts, trends or cycles, while keeping acceptable false alarm rates.

21.3 Confusing the voice of the customer with the voice of the process

“Hearing voices” is essential to effective SPC implementation. Two voices demand close attention: the voice of the customer and the voice of the process. Both are important, but confusing one for the other can lead to misguided actions, wasted effort, and poor decision-making.

The voice of the customer refers to the expected or desired outcomes of the process, while the voice of the process reflects its actual capability and stability regardless of what the customer wants.

The voice of the customer is often expressed through specification limits that define the desired level of outcomes. These limits may take the form of an upper or lower acceptance threshold, or an interval specifying the acceptable range for a given outcome. For example, based on literature reviews or results from comparable units, it may be determined that an acute caesarean section should be completed within 30 minutes of the decision to operate. Any procedure taking longer than this would be considered unacceptable.

If we listen only to the voice of the customer, it may be tempting to treat any caesarean section taking longer than 30 minutes as a special cause – and perhaps even initiate a root cause analysis in each case. However, without also considering the voice of the process – which tells us whether the process is stable and what it is actually capable of, as indicated by the control limits – such an approach is flawed.

Unacceptable outcomes may well arise from highly stable and predictable processes – and conversely, acceptable outcomes may result from unstable ones. Before taking action based on specific outcomes, we must first determine whether they are the result of common cause or special cause variation. This distinction is crucial, as each type of variation calls for a fundamentally different approach to improvement.

As detailed in Chapter 1, Understanding Variation, addressing common cause variation requires a thorough redesign of the entire process, while addressing special cause variation involves identifying the root causes of the special cause(s) and either eliminating or reinforcing them, depending on whether they are undesirable or beneficial.

The long-term goal of any improvement initiative is to establish stable processes that consistently deliver acceptable results.

The two voices and the corresponding four appropriate actions are summarised in the table below:

The interplay between the voice of the customer and the voice of the process, with the four appropriate actions for each scenario.
Process Stable Process Unstable
Customer Satisfied OK – hands off and continue to monitor Trouble ahead – stabilise the process
Customer Unsatisfied Trouble now – revise the process Chaos – revise and stabilise the process

21.4 Automatic rephasing

Rephasing involves splitting an SPC chart to recalculate the centre line and control limits after a sustained shift has been identified.

Figure 21.1 shows the number of hospital infections (C. diff.) before and after an intervention, which began after month 24.

qic(month, n,
    data  = cdi,
    chart = 'c',
    title = 'Hospital associated C. diff.-infections',
    ylab  = 'Count',
    xlab  = 'Month')
Hospital infections.

Figure 21.1: Hospital infections.

Figure 21.2 has been rephased following the start of the intervention and the subsequent shift. This correctly distinguishes two distinct periods, each with stable but differing levels of infections.

qic(month, n,
    data  = cdi,
    chart = 'c',
    part  = period,
    title = 'Hospital associated C. diff.-infections',
    ylab  = 'Count',
    xlab  = 'Month')
Hospital infections -- rephasing done right!

Figure 21.2: Hospital infections – rephasing done right!

Automatic rephasing is when the centre line and control limits are recalculated automatically after the software detects a shift – typically triggered by a prolonged run of data points on one side of the centre line. Some software packages perform automatic rephasing by default (Reading et al. (2021)).

Figure 21.3 has been automatically rephased following detection of the shift. Coincidentally, the run that signalled the shift began three months prior to the intervention. However, these months should correctly be considered part of the baseline period. As a result, the control limits and centre lines do not accurately reflect the distinct before-and-after periods. The centre lines are inflated, and a freak data point falsely suggests instability during the baseline period.

qic(month, n,
    data        = cdi,
    chart       = 'c',
    part        = 22,
    part.labels = c('pre', 'post'),
    title       = 'Hospital associated C. diff.-infections',
    ylab        = 'Count',
    xlab        = 'Month')
Hospital infections -- rephasing done wrong!

Figure 21.3: Hospital infections – rephasing done wrong!

We strongly discourage the use of automatic rephasing, as it can lead to misleading interpretations of process stability. By automating shift detection, automatic rephasing effectively removes the essential human insight into the process. Therefore, rephasing should be a deliberate decision made by individuals with a thorough understanding of the process and any associated interventions.

Rephasing may be appropriate when the following conditions are met:

  • there is a sustained shift in data,
  • the reason for the shift is known,
  • the shift is in the desired direction, and
  • the shift is expected to continue.

If any of these conditions is not met, we should rather seek to understand the nature and causes of any shift, following the strategy outlined in Chapter 4, Using SPC in Healthcare.

21.5 The control chart vs run chart debate

It is a common and persistent misconception that control charts are inherently superior to run charts in detecting special cause variation (e.g. Carey (2002)). Similarly, run charts and control charts are often viewed as fundamentally different methods, warranting distinct terminologies (e.g. Perla, Provost, and Murray (2011)). We disagree with both notions.

We refer to run charts and control charts collectively as SPC charts – time series charts that use statistical tests to detect signs of special cause variation. These tests – known as rules – employ various techniques to identify patterns in data that are unlikely to arise in stable (i.e., purely random) processes.

All SPC rules share a common basis: they rely on the position of data points relative to the process centre and/or process spread. For a detailed discussion of these concepts, see Appendix C, Basic Statistical Concepts.

Traditionally, control charts use rules based on sigma limits, while run charts apply rules based on the centre line. However, combining rules based on both centre and spread – as we advocate throughout this book – is not a new practice. Consider Rule #4 of the Western Electric rules – a run of eight or more consecutive data points on the same side of the centre line. This rule, which is independent of sigma limits, is grounded in the same statistical principle as the two runs rules for unusually long or few runs.

Thus, run charts and control charts are two sides of the same coin, with the key difference being the presence or absence of control limits. These limits serve to quickly signal sudden, significant shifts in the data and to provide a visual representation of natural process variation. Runs analysis – using rules for unusually long or few runs (or crossings) – is more effective than control limits for detecting minor to moderate sustained shifts or trends in data. Combining these two principles offers the best of both worlds.

Nonetheless, standalone run charts present some advantages over control charts. A key benefit is that run charts are a lot easier to construct than control charts using pen and paper only: plot the dots, and draw a horizontal line that splits data in half.

Additionally, using the median rather than the mean as the centre line effectively divides the data symmetrically, making the chart largely independent of distributional assumptions and less prone to misapplication. The latter being particularly important i healthcare were many practitioners lack formal statistical training.

Finally, because runs analysis is most effective for detecting minor to moderate sustained shifts, which is ultimately what we seek when aiming to improve processes, run charts may be all that is needed during the improvement phase. Once a satisfactory level of improvement has been achieved, control limits can be added – along with switching to the mean as the centre line – to monitor process stability and quickly detect sudden larger, and possibly transient, shifts in the data.

References

Anhøj, Jacob. 2015. “Diagnostic Value of Run Chart Analysis: Using Likelihood Ratios to Compare Run Chart Rules on Simulated Data Series.” PLoS ONE. https://doi.org/10.1371/journal.pone.0121349.
Carey, Raymond G. 2002. “How Do You Know That Your Care Is Improving? Part 2: Using Control Charts to Learn from Your Data.” J Ambulatory Care Manage 25(2): 78–88.
Davis, Robert B, and William H Woodall. 1988. “Performance of the Control Chart Trend Rule Under Linear Shift.” Journal of Quality Technology 20: 260–62.
Perla, Rocco J, Lloyd P Provost, and Sandy K Murray. 2011. “The Run Chart: A Simple Analytical Tool for Learning from Variation in Healthcare Processes.” BMJ Qual Saf 20: 46–51. https://doi.org/10.1136/bmjqs.2009.037895.
Provost, Lloyd P, and Sandra K Murray. 2011. The Health Care Data Guide: Learning from Data for Improvement. San Francisco: John Wiley & Sons Inc.
Reading, Christopher, Simon Wellesley-Miller, Zoë Turner, Tom Jemmett, Tom Smith, Chris Mainey, and John MacKintosh. 2021. NHSRplotthedots: Draw XmR Charts for NHSE/i ’Making Data Count’ Programme. https://doi.org/10.32614/CRAN.package.NHSRplotthedots.
———. 2010. “The Right and Wrong Ways of Computing Limits.” 2010. https://www.spcpress.com/pdf/DJW205.pdf.
———. 2016. “The Levey-Jennings Chart – How to Get the Most Out of Your Measurement System.” 2016. http://www.spcpress.com/pdf/DJW290.pdf.