C Two Types of Errors When Using SPC

The primary aim of statistical process control is to distinguish between common cause and special cause variation. In practice, this classification is not perfect. As with any diagnostic method, SPC is subject to two types of error. A useful analogy is a screening test that may indicate disease in a healthy patient (a false positive) or fail to detect disease in a patient who is actually ill (a false negative).

Applied to SPC, the two errors are:

False positive (Type I error): Treating an outcome arising from common cause variation as if it were due to a special cause, and therefore wrongly searching for a special cause when the true source of variation is the underlying process.

False negative (Type II error): Treating an outcome arising from a special cause as if it were due to common cause variation, and therefore wrongly overlooking the special cause.

Either mistake can be costly. If we treat all variation as special cause variation, we maximise the losses associated with false positives. If we treat all variation as common cause variation, we maximise the losses associated with false negatives.

Unfortunately, it is impossible to reduce both kinds of error to zero. Shewhart therefore sought a practical strategy that would make both errors relatively rare. He concluded that this depended largely on the cost of looking unnecessarily for special causes. Drawing on theory, empirical evidence, and pragmatism, he argued that control limits placed three standard deviations above and below the mean provide a reasonable balance between the two kinds of error.

C.1 Quantifying the diagnostic error of SPC charts

C.1.1 Average run length

Traditionally, the performance of SPC charts has been described using the average run length (ARL), that is, the average number of data points before a signal occurs (Montgomery 2020, 186).

The in-control average run length is:

\[ ARL_0=\frac{1}{\alpha} \] where the process is stable and only common cause variation is present.

The out-of-control average run length is:

\[ ARL_1=\frac{1}{1-\beta} \] where a special cause is present.

Here,

\[ \alpha=P\{\text{signal | common cause variation}\}=P\{\text{false positive}\}=P\{\text{type 1 error}\} \]

is the probability of a false positive, and

\[ \beta=P\{\text{no signal | special cause variation}\}=P\{\text{false negative}\}=P\{\text{type 2 error}\} \]

is the probability of a false negative.

For example, in a stable process with normally distributed data, the probability, \(\alpha\), of a point falling outside 3-sigma limits is 0.0027. Thus,

\[ARL_0=1/0.0027\approx370\]

meaning that, on average, we would expect one false alarm every 370 data points.

The value of \(ARL_1\) depends on \(\beta\), which in turn depends on the size of the special cause relative to the natural variation in the process. Large shifts are easier to detect and therefore result in shorter out-of-control run lengths.

In an ideal world, we would like \(ARL_0=\infty\) and \(ARL_1=1\). That is, no false alarms and immediate detection of real signals. In practice, this is impossible, because improving one usually worsens the other.

ARL is closely related to the more familiar concepts of sensitivity and specificity. In the context of SPC:

\[ specificity=P\{\text{no signal | common cause variation}\}=P\{\text{true negative}\}=1-\alpha \]

\[ sensitivity=P\{\text{signal | special cause variation}\}=P\{\text{true positive}\}=1-\beta \]

Specificity tells us how well a chart avoids false alarms. Sensitivity tells us how well it detects real special causes.

C.1.2 Likelihood ratios

Sensitivity and specificity are useful, but they do not directly answer the question that often matters most in practice: If the chart signals, how likely is it that a special cause is truly present?

Likelihood ratios are more helpful for answering this kind of question.

If a chart signals, it may be a true positive or a false positive. The positive likelihood ratio tells us how much more likely a signal is to come from a process with special cause variation than from a stable process:

\[ LR+=TP/FP=sensitivity/(1-specificity) \]

\[ LR-=FN/TN=(1-sensitivity)/specificity \]

A likelihood ratio greater than 1 supports the presence of special cause variation. A likelihood ratio less than 1 argues against it. The further the value is from 1, the stronger the evidence. As a rule of thumb, a positive likelihood ratio above 10 is considered strong evidence in favour of the condition being tested for, while a negative likelihood ratio below 0.1 is considered strong evidence against it (Deeks and Altman 2004).

Likelihood ratios are therefore useful measures of the diagnostic value of SPC rules (Anhøj 2015). All else being equal, a good rule, or a good combination of rules, has a high positive likelihood ratio and a low negative likelihood ratio.

A worked example is shown below.

Results from runs analyses of 2000 simulated run charts with 24 data points. In half the simulations a shift of 2 SD was introduced in the last 12 subgroups. Shift +/– indicates the presence or absence of true shifts in process mean. Signal +/– indicates the result from the run chart analysis using the two runs analysis rules (Anhøj 2015).
Shift– Shift+ Likelihood ratio
Signal– 927 115 LR– = 115 / 927 = 0.12
Signal+ 73 885 LR+ = 885 / 73 = 12

Studies comparing different combination of rules using likelihood ratios (Anhøj 2015; Anhøj and Wentzel-Larsen 2018) have shown that:

  • The 3-sigma rule is effective in detecting moderate to large, possibly transient, shifts.

  • The 3-sigma rule loses specificity as the number of data points increases.

  • Runs analysis using the two rules proposed in Chapter 3 is effective in detecting minor to moderate sustained shifts, regardless of the number of data points.

  • Combining the 3-sigma rule with the two runs rules, and keeping the number of data points between 20 and 30, provides a good balance between false positive and false negative signals.

C.2 Conclusion: Keeping the balance

SPC charts, like all statistical and medical tests, are imperfect. They may suggest a problem where none exists, or fail to detect a real one. Either type of error can lead to losses: wasted time and effort from chasing false alarms, or missed opportunities and harm from overlooking genuine special causes.

For this reason, the choice of SPC rules matters. Since Shewhart first introduced the control chart in 1924 using only the 3-sigma rule, many supplementary rules have been proposed to increase sensitivity. But increased sensitivity comes at the cost of reduced specificity. Applying too many rules, or overly sensitive rules, can quickly lead to a stream of false alarms and unnecessary investigation.

The aim is therefore not to maximise sensitivity at all costs, but to strike a practical balance between sensitivity and specificity. Likelihood ratios provide a useful way of judging whether a particular rule set achieves that balance.

The three rules recommended in this book – the 3-sigma rule and the two runs rules – have been supported by both empirical studies and practical experience. Together, they offer a useful balance between detecting real signals and avoiding false ones.

References

Anhøj, Jacob. 2015. “Diagnostic Value of Run Chart Analysis: Using Likelihood Ratios to Compare Run Chart Rules on Simulated Data Series.” PLoS ONE, ahead of print. https://doi.org/10.1371/journal.pone.0121349.
Anhøj, Jacob, and Tore Wentzel-Larsen. 2018. “Sense and Sensibility: On the Diagnostic Value of Control Chart Rules for Detection of Shifts in Time Series Data.” BMC Medical Research Methodology, ahead of print. https://doi.org/10.1186/s12874-018-0564-0.
Deeks, Jonathan J, and Douglas G Altman. 2004. “Diagnostic Tests 4: Likelihood Ratios.” BMJ 329: 168–69. https://doi.org/10.1136/bmj.329.7458.168.
Montgomery, Douglas C. 2020. Introduction to Statistical Quality Control, Eighths Ed. Wiley.