C Two Types of Errors When Using SPC

Classifying variation into common cause or special cause is the primary focus of statistical process control methodology. In practice, this classification is subject to two types of error which can be compared to an imperfect screening test that sometimes shows a patient has disease when in fact the patient is free from disease (false positive), or the patient is free from disease when in fact the patient has disease (false negative).

False positive (type 1 error): Treating an outcome resulting from a common cause as if it were a special cause and (wrongly) seeking to find a special cause, when in fact the cause is the underlying process.
False negative (type 2 eroor): Treating an outcome resulting from a special cause as if it were a common cause and so (wrongly) overlooking the special cause.

Either mistake can cause losses. Treating all data as special cause variation maximises the losses from false positives; and treating all data as common cause variation maximises the losses from false negatives.

Unfortunately, in practice it is impossible to reduce both mistakes to zero. Shewhart sought a strategy to make either mistake only rarely and concluded that this depended largely upon the costs of looking unnecessarily for special cause variation. Using mathematical theory, empirical evidence, and pragmatism, he argued that setting control limits to three standard deviations below and above the mean provides a reasonable balance between making the two types of mistakes.

C.1 Quantifying the diagnostic error of SPC charts

C.1.1 Average run length

Traditionally, the performance characteristics of SPC charts have been evaluated through the so-called average run length (ARL), that is, the average number of data points until a special cause is signalled (Montgomery 2020, 186):

\[ ARL_0=\frac{1}{\alpha} \] for the in-control ARL, when no special cause is present, and

\[ ARL_1=\frac{1}{1-\beta} \] for the out-of-control ARL, when a special cause is present, where

\[ \alpha=P\{\text{signal | common cause variation}\}=P\{\text{false positive}\}=P\{\text{type 1 error}\} \]

\[ \beta=P\{\text{no signal | special cause variation}\}=P\{\text{false negative}\}=P\{\text{type 2 error}\} \]

For example, in a common cause process with normal data the chance (\(\alpha\)) of a data point falling outside the 3-sigma limits is 0.0027 and \(ARL_0=1/0.0027=370\), meaning that we should expect to wait on average 370 data points between false alarms.

The out-of-control ARL depends on the false negative risk, \(\beta\), which in turn depends on the size of the shift (signal) relative to the size of the common cause variation (noise).

The ideal control charts would have \(ARL_0=\infty\) and \(ARL_1=1\). In practice, this is not possible because ARLs are linked – if one goes up, the other goes up too.

ARLs are related to sensitivity and specificity, which may be more familiar to healthcare workers. In general, sensitivity and specificity tell how well a test is able to identify the presence or absence of a certain condition.

Specifically, regarding SPC charts:

\[ specificity=P\{\text{no signal | common cause variation}\}=P\{\text{true negative}\}=1-\alpha \]

\[ sensitivity=P\{\text{signal | special cause variation}\}=P\{\text{true positive}\}=1-\beta \]

C.1.2 Likelihood ratios

Sensitivity and specificity are, however, not that useful on their own – they describe how a special cause predicts a signal, not how a signal predicts a special cause, which is what we really want to know.

Likelihood ratios are diagnostic measures designed to answer such questions. Assume that an SPC chart signals special cause variation. A perfect test would mean that the chart would certainly come from an unstable process (true positive, TP). However, some charts with only common cause variation also signals (false positive, FP). We therefore correct the true positive rate by the false positive rate by dividing one with the other. Likewise, if a chart does not signal it could be a false negative (FN) rather than a true negative (TN).

\[ LR+=TP/FP=sensitivity/(1-specificity) \]

\[ LR-=FN/TN=(1-sensitivity)/specificity \]

A likelihood ratio greater than 1 speaks in favour of the condition being tested for, which in our case is special cause variation, while a likelihood ratio less than 1 speaks against special cause variation. The further a likelihood ratio is from 1, the more or less likely is the presence of special cause variation.

As a rule of thumb, a positive likelihood ratio greater than 10 is considered strong evidence that the condition being tested for is present. A negative likelihood ratio smaller than 0.1 is considered strong evidence against the condition (Deeks and Altman 2004).

Thus, likelihood ratios allow us to quantify the probability of special causes in data and are useful quality characteristics of SPC rules (Anhøj 2015). All else being equal, a “good” rule (or combination of rules) is one with a high positive likelihood ratio and a low negative likelihood ratio.

A worked example is presented in the table below:

Results from runs analyses of 2000 simulated run charts with 24 data points. In half the simulations a shift of 2 SD was introduced in the last 12 subgroups. Shift +/– indicates the presence or absence of true shifts in process mean. Signal +/– indicates the result from the run chart analysis using the two runs analysis rules (Anhøj 2015).
	Shift–	Shift+	Likelihood ratio
Signal–	927	115	LR– = 115 / 927 = 0.12
Signal+	73	885	LR+ = 885 / 73 = 12

Studies comparing different combination of rules using likelihood ratios (Anhøj 2015; Anhøj and Wentzel-Larsen 2018) found that:

The 3-sigma rule is effective (high LR+, low LR–) in signalling moderate to large possibly transient shifts in data.
The 3-sigma rule looses specificity (more false positive signals) with increasing number of data points.
Runs analysis using the two rules proposed in Chapter 3 is effective in signalling minor to moderate sustained shifts in data regardless of the number of data points.
Combining the 3-sigma rule with runs analysis and keeping the number of data points between 20 and 30 strikes a good balance between false positive and false negative signals.

C.2 Conclusion: Keeping the balance

SPC charts, like any statistical or medical tests, can sometimes suggest a problem where none exists or fail to detect a real issue. This can result in losses – either from wasted resources chasing false alarms or from harm caused by overlooked special causes. However, carefully selecting which rules to apply to SPC charts can help minimize losses in both cases.

Since Shewhart first introduced the control chart in 1924 – using only the 3-sigma rule – many additional rules and rule sets have been proposed to improve the sensitivity of SPC charts to special cause variation. However, increased sensitivity comes at the cost of reduced specificity, and applying too many or overly sensitive rules inevitably leads to wasted resources chasing false alarms. Our goal is to strike the right balance between sensitivity and specificity. Quantifying diagnostic errors using likelihood ratios is a valuable tool in achieving this.

The rules proposed in this book – the two runs rules and the 3-sigma rule – have been validated through careful studies and years of practical experience, demonstrating a useful balance between sensitivity and specificity.

References

Anhøj, Jacob. 2015. “Diagnostic Value of Run Chart Analysis: Using Likelihood Ratios to Compare Run Chart Rules on Simulated Data Series.” PLoS ONE. https://doi.org/10.1371/journal.pone.0121349.

Anhøj, Jacob, and Tore Wentzel-Larsen. 2018. “Sense and Sensibility: On the Diagnostic Value of Control Chart Rules for Detection of Shifts in Time Series Data.” BMC Medical Research Methodology. https://doi.org/10.1186/s12874-018-0564-0.

Deeks, Jonathan J, and Douglas G Altman. 2004. “Diagnostic Tests 4: Likelihood Ratios.” BMJ 329: 168–69. https://doi.org/10.1136/bmj.329.7458.168.

Montgomery, Douglas C. 2020. Introduction to Statistical Quality Control, Eighths Ed. Wiley.