Chapter 5 Your First SPC Charts With Base R
From Part 1 of this book we have a good grasp of what SPC is and how SPC charts work. In this chapter we will start constructing SPC charts using functions from base R. In later chapters we will include functions from ggplot2 and qicharts2 (an R package developed by JA).
In essence, an SPC charts is a (point-and-)line plot of data over time with a horizontal line to represent the data centre and – in case of a control chart – two lines to represent the estimated upper and lower boundaries of the natural variation in data.
5.1 A run chart of blood pressure data
Consider the data from Figure 2.1, which show systolic blood pressure measurements (mm Hg) for a patient taken in the morning over 26 consecutive days (Mohammed, Worthington, and Woodall 2008).
systolic <- c(169, 172, 175, 174, 161, 142,
174, 171, 168, 174, 180, 194,
161, 181, 175, 176, 186, 166,
157, 183, 177, 171, 185, 176,
181, 174)
First we plot a simple point-and-line chart without any additional lines (Figure 5.1):
Figure 5.1: Simple run chart
(As a side note, this reminds me (JA) of a manager, who once said to me: “You make such beautiful graphs, but can’t you stop them from going up and down all the time.” 😁)
To guide the runs analysis we will add a horizontal centre line, which in run charts is usually the median of the data points (Figure 5.2):
# Create systolic-coordinates for the centre line
cl <- median(systolic) # calculate median
cl <- rep(cl, length(systolic)) # repeat to match the length of y
# Plot data and add centre line
plot(systolic, type = 'o')
lines(cl)
Figure 5.2: Run chart with centre line
We find that the longest run has 4 data points (#14-#17) and that the curve crosses the centre line 9 times. Four data points (#4, #7, #10, #26) lie directly on the centre line, so we have 22 useful observations. With 22 useful observations and using the two runs rules proposed in Chapter 3, the upper limit for longest run is 7 as is (coincidentally) the lower limit for number of crossings. Consequently, there are no signs of sustained shifts or trends in data over time.
5.2 Adding control limits to produce a control chart
We use the same technique to add the lower and upper control limits.
Remember that the control limits are usually set to \(CL \pm 3 SD\), where CL is the centre line, usually the mean, and SD is the estimated standard deviation – that is, the standard deviation of the natural variation in data, not the pooled standard deviation that would include both common cause and any special cause variation.
For data consisting of single measurements, we choose the I chart (see Chapter 2). To estimate the common cause standard deviation we use the average moving range divided by a constant, 1.128. The moving ranges are the absolute pairwise differences between consecutive data points. We will talk much more about control limits in Chapter 6.
# Calulate the centre line (mean)
cl <- rep(mean(systolic), length(systolic))
# Calculate the moving ranges of data
mr <- abs(diff(systolic))
# Print the moving ranges for our viewing pleasure
mr
## [1] 3 3 1 13 19 32 3 3 6 6 14 33 20 6 1 10 20 9 26 6 6 14 9 5 7
# Calculate the average moving range
amr <- mean(mr)
# Calculate the process standard deviation
s <- amr / 1.128
# Create y-coordinates for the control limits
lcl <- cl - 3 * s
ucl <- cl + 3 * s
When plotting data, we need to expand the y-axis limits to make room not only for the data points but also the control limits (Figure 5.3):
# Plot data while expanding the y-axis to make room for all data and lines
plot(systolic, type = 'o', ylim = range(systolic, lcl, ucl))
# Add lines
lines(ucl)
lines(cl)
lines(lcl)
Figure 5.3: Standardised control chart
One (freak) data point is below the lower control limit suggesting that this reading has most likely been influenced by something outside the natural process. The control chart itself does not tell what caused the special cause, but it tells us that this data point should be investigated with the purpose of learning and improvement.
5.3 That’s all, Folks!
So constructing an SPC chart using R may be done using a few lines of code. In fact, most of the code in this chapter went to prepare the data to be plotted. The charts themselves are rather simple and plotting is the same every time: 1. plot the dots; 2. add the lines.
Later we will wrap all the steps in a function that automates the calculation of centre and control lines, highlights signals of non-random variation in data, and makes plots that are a lot nicer to look at than the rather crude ones we have produces so far.
In the next chapter we will produce SPC charts that are most commonly used in healthcare, “The Magnificent Seven”.