Chapter 6 Calculating Control Limits

In the previous chapter we established the basis for constructing SPC charts with R using the I chart as an example. In this chapter we continue with the rest of The Magnificent Seven control charts and how to construct their control limits.

6.1 Introducing the spc() function

To avoid repeating ourselves, let’s begin by creating a function to automate the plotting for us.

spc <- function(
    x,           # x axis values
    y   = NULL,  # data values
    cl  = NA,    # centre line
    lcl = NA,    # lower control limit
    ucl = NA,    # upper control limit
    ...          # other parameters passed to the plot() function
) {
  # if y is missing, set y to x and make a sequence for x
  if (is.null(y)) {
    y <- x
    x <- seq_along(y)
  }
  
  # repeat line values to match the length of y
  if (length(cl) == 1)
    cl <- rep(cl, length(y))
  
  if (length(lcl) == 1)
    lcl <- rep(lcl, length(y))
  
  if (length(ucl) == 1)
    ucl <- rep(ucl, length(y))
  
  # plot the dots and draw the lines
  plot(x, y, 
       type = 'o',
       ylim = range(y, lcl, ucl, na.rm = TRUE),
       ...)
  lines(x, cl)
  lines(x, lcl)
  lines(x, ucl)
}

The spc() function takes five arguments of which only the first, x, is mandatory. If only x is provided, a simple point-and-line chart will be drawn from the x values. If y is also provided, x values will be used for the x axis. The line arguments (cl, lcl, ucl) are used (if provided) for the centre line and control limits respectively. Line arguments may be given as either single values or vectors of the same length as x. In addition, we may provide additional arguments for the plot() function, e.g. main, xlab, and ylab for title and axis labels.

Let us test it with the blood pressure data from Chapter 5 (Figure 6.1).

# create an x variable, not that is it necessary in this case, just because we can
day <- seq_along(systolic)

# plot data
spc(day, systolic, cl, lcl, ucl)

Figure 6.1: Control chart of systolic blood pressure

With this function we are now able to construct all kinds of control charts. All we need to know is how to calculate the centre line and the control limits.

6.2 Formulas for calculation of control limits

The formulas for calculation control limits for The Magnificent Seven introduced in Chapter 2 are provided in Table 6.1. Don’t be alarmed by the number of strange symbols, we will translate the formulas to R code one by one as we move along.

Table 6.1: Formulas for calculating control limits
Subgroups	Chart type	Control limits	Assumed distribution
Count data
Counts	C	\(\bar{c}\pm3\sqrt{\bar{c}}\)	Poisson
Rates	U	\(\bar{u}\pm3\sqrt{\frac{\bar{u}}{n_{i}}}\)	Poisson
Proportions	P	\(\bar{p}\pm3\sqrt{\frac{\bar{p}(1-\bar{p})}{n_{i}}}\)	Binomial
Measurement data
Individual measurements	I	\(\bar{x}\pm2.66\overline{MR}\)	Normal
Moving ranges of individual measurements	MR	\(3.267\overline{MR}\)	Normal
Averrages of 2 or more measurements	X-bar	\(\bar{\bar{x}}\pm A_{3}\bar{s}\)	Normal
Standard deviations of 2 or more measurements	S	\(B_{3}\bar{s};\ B_{4}\bar{s}\)	Normal

As discussed in Chapter 2, data come in two flavours: count data and measurement data. Counts are positive integers that represent counts of events or cases, for example patient falls, surgical complications, or healthy babies. Measurements are data that are measured on continuous scales and may have decimals, for example blood pressure, height and weight, or waiting times.

6.3 Count data

For count charts in this chapter we will use the bacteremia data set:

# read data from file
bact <- read.csv('data/bacteremia.csv',  # path to data file
                 comment.char = '#',     # ignore lines that start with "#"
                 colClasses   = c(       # specify variable types
                   'Date',
                   'integer',
                   'integer',
                   'integer',
                   'integer'
                 ))

# print the first six rows
head(bact)

##        month ha_infections risk_days deaths patients
## 1 2017-01-01            24     32421     23      100
## 2 2017-02-01            29     29349     22      105
## 3 2017-03-01            26     32981     13       99
## 4 2017-04-01            16     29588     14       85
## 5 2017-05-01            28     30856     17       98
## 6 2017-06-01            16     30544     15       85

The variables are:

month (date): month of infection
ha_infections (numeric): number of hospital acquired infections
risk_days (numeric): number of patient days without infection
deaths (numeric): number of patients who died within 30-day after a hospital or community acquired (all-cause) infection
patients (numeric): number of patients with all-cause infection

We use C charts for event counts and U chart for event rates. For case proportions we use the P chart.

6.3.1 C chart

The C charts (C for counts) is the simplest of all control charts and the easiest to produce. The process standard deviation is simply estimated as the square root of the process mean.

C charts are appropriate when counting events from (nearly) equally big chunks of time or space.

Figure 6.2 shows a C chart of the monthly number of hospital acquired bacteremias.

with(bact, {
  cl  <- mean(ha_infections)
  lcl <- cl - 3 * sqrt(cl)
  ucl <- cl + 3 * sqrt(cl)
  
  # print the limits
  cat('UCL =',  ucl, '\n')
  cat('CL  =',   cl, '\n')
  cat('LCL = ', lcl, '\n')
  
  # plot the chart
  spc(month, ha_infections,
      cl, lcl, ucl,
      ylab = 'Infections', # y-axis label
      xlab = 'Month')      # x-axis label
})

## UCL = 36.94952 
## CL  = 22.66667 
## LCL =  8.38381

Figure 6.2: C control chart of monthly numbers of hospital acquired bacteremias

The average monthly number of cases is 22.7, and all data points are within the control limits ranging from 8.4 to 36.9. So if nothing changes, we should expect future infection counts to be around 23, and we should not be surprised if once in a while, we observe as little as 9 or as many as 36 infections in a single month.

6.3.2 U chart

U charts are useful when events are counted over chunks of time or space that are not equally sized resulting in “unequal area of opportunity” (hence the U). In our case we might want to adjust for for the number of patient days that may vary depending on the time of year or between organisational units. The U chart adjust for this by presenting rates rather than raw counts.

Events are often rare in comparison to their areas of opportunity. So to avoid very small numbers on the y-axis it may be useful to multiply the y-axis by some factor before plotting. In Figure 6.3 we multiply by 10,000 to display infections per 10,000 risk days rather than per day.

with(bact, {
  y   <- ha_infections / risk_days           # rates to plot
  cl  <- sum(ha_infections) / sum(risk_days) # overall mean rate, centre line
  s   <- sqrt(cl / risk_days)                # standard deviation
  lcl <- cl - 3 * s                          # lower control limit
  ucl <- cl + 3 * s                          # upper control limit
  
  # multiply y axis to present infections per 10,000 risk days
  multiply <- 10000
  y        <- y * multiply
  cl       <- cl * multiply
  lcl      <- lcl * multiply
  ucl      <- ucl * multiply
  
  spc(month, y, cl, lcl, ucl, 
      ylab = 'Infections per 10,000 risk days',
      xlab = 'Month')
})

Figure 6.3: U chart of monthly number of infections per 10,000 risk days

The U chart shows that on average we have 7.5 infections per 10,000 risk days, and that all data points are between the control limits ranging from about 3.5 to 12. We see that the control limits vary depending on the denominator (risk days), for each data point. Large denominator \(\rightarrow\) narrow limits; small denominator \(\rightarrow\) wide limits.

In cases like this where the denominator – the area of opportunity – only varies little between subgroups, the U charts adds little compared to the C chart. For pedagogical reasons we may prefer the C chart, because it is a lot easier to relate to 23 infections per month than to 7.5 infections per 10,000 risk days.

6.3.3 P chart

P charts are for proportions (or percentages). Figure 6.4 shows the monthly percentage of patients with bacteremia who died within 30 days.

with(bact, {
  y   <- deaths / patients
  cl  <- sum(deaths) / sum(patients)      # process mean, centre line
  s   <- sqrt((cl * (1 - cl) / patients)) # process standard deviation
  lcl <- cl - 3 * s                       # lower control limit
  ucl <- cl + 3 * s                       # upper control limit
  
  # multiply by 100 to get percentages rather than proportions
  multiply <- 100
  y        <- y * multiply
  cl       <- cl * multiply
  lcl      <- lcl * multiply
  ucl      <- ucl * multiply
  
  spc(month, y, cl, lcl, ucl, 
      ylab = '%', 
      xlab = 'Month')
})

Figure 6.4: P chart of monthly 30-day mortality rates

On average the mortality is 21%, and all data points are within the control limits. As with U charts, the control limits vary depending on the size of the denominator.

6.4 Measurement data

For this section we will use a data set on response times for grade 2 caesarean sections, that is, the time (in minutes) it took from the decision to perform a C-section to the baby was delivered. The goal is to keep the response times below 30 minutes.

The csect data frame contains the date and time, the month, and the number of minutes from decision to delivery for 208 grade 2 section over a two-year period.

# read raw data
csect <- read.csv('data/csection_delay.csv',
                  comment.char = '#',
                  colClasses   = c('POSIXct',
                                   'Date',
                                   'integer'))

csect <- csect[order(csect$datetime), ]

# show the first 6 rows
head(csect)

##              datetime      month delay
## 1 2016-01-06 03:55:40 2016-01-01    22
## 2 2016-01-06 20:52:34 2016-01-01    22
## 3 2016-01-07 02:50:43 2016-01-01    29
## 4 2016-01-07 22:32:27 2016-01-01    28
## 5 2016-01-09 14:56:09 2016-01-01    22
## 6 2016-01-09 21:21:24 2016-01-01    20

6.4.1 I chart (aka X chart)

The “I” in I chart stand for “individuals” because it plots individual values from subgroups of size 1. I charts are also often referred to as X charts.

I charts are useful when measurements come from individual units, for example waiting times or blood pressure measurements for individual patients.

As we will see later, I charts are in fact useful for all kinds of data because they base their estimations on the actual variation that is present in data rather than theoretical parameters from assumed distributions. For this reason, the I chart is considered the Swiss army knife of SPC.

When subgroups consist of single values we use the average absolute difference between neighbouring data points, the average moving range (\(\overline{MR}\)), as an estimate of the within subgroup variation. By dividing this value with 1.128 we get an estimate of the process standard deviation. Alternatively, we may multiply \(\overline{MR}\) by 3 / 1.128 = 2.66 to get 3 \(SD\)s.

Let’s have a look at individual delay times for the latest 60 C-sections (Figure 6.5).

with(tail(csect, 60), {
  xbar <- mean(delay)      # mean value, centre line
  mr   <- abs(diff(delay)) # moving ranges
  amr  <- mean(mr)         # average moving range
  s    <- amr / 1.128      # process standard deviation
  
  spc(delay,
      cl  = xbar,
      lcl = xbar - 3 * s,
      ucl = xbar + 3 * s)
})

Figure 6.5: I-chart

On average, the delay time for these cases is 24 minutes. Three data points are outside the control limits (#1, #4 and #31) suggesting that these cases were special and that it might be useful to have a closer look to find out what went well with case #1 and not so well with cases #4 and #31.

6.4.2 MR chart

The MR chart plots the moving ranges, that is, the absolute pairwise differences of neighbouring data points. It is a companion to the I chart. Since moving ranges can always be zero but never negative, the MR chart has no lower control limit.

with(tail(csect, 60), {
  mr  <- c(NA, abs(diff(delay)))  # add NA in front to match the length of the I-chart
  amr <- mean(mr, na.rm = TRUE)
  
  spc(mr,
      cl  = amr,
      ucl = 3.267 * amr)
})

Figure 6.6: MR-chart

Note that there is one less moving range than individual values. To align the charts, we insert an NA value at the beginning of the MR-chart.

We may plot the two charts together (Figure 6.7):

with(tail(csect, 60), {
  xbar <- mean(delay)
  mr   <- c(NA, abs(diff(delay)))
  amr  <- mean(mr, na.rm = TRUE)
  
  op <- par(mfrow = c(2, 1),           # setting up plotting area
            mar   = c(5, 5, 2, 1))
  spc(delay,
      cl   = xbar,
      lcl  = xbar - 2.66 * amr,
      ucl  = xbar + 2.66 * amr,
      main = 'I-chart',
      ylab = 'Delay (minutes)',
      xlab = '')
  
  spc(mr,
      cl   = amr,
      ucl  = 3.267 * amr,
      main = 'MR-chart',
      ylab = 'Moving range (minutes)',
      xlab = 'C-section #')
  par(op)                              # restoring plotting area
})

Figure 6.7: I- and MR-charts

The MR-chart also finds three data points outside the limits. These coincide with two of the special causes found by the I chart and support our conclusion that these deliveries were special. Note that each data point on the I chart (except the first and last ones) produces two moving ranges on the MR chart.

6.4.3 X-bar chart

The X-bar chart is appropriate when the subgroups consist of samples of two or more measurements.

To plot a control chart of the monthly average delays, we must first aggregate data to find the mean and the standard deviation of delay times and the number of sections per month.

# split data frame by month
csect.agg <- split(csect, csect$month)

# aggregate data by month
csect.agg <- lapply(csect.agg, function(x) {
  data.frame(month = x$month[1],
             mean  = mean(x$delay),
             sd    = sd(x$delay),
             n     = nrow(x))
})

# put everything together again
csect.agg <- do.call(rbind, c(csect.agg, make.row.names = FALSE))

# print the first 6 rows
head(csect.agg)

##        month     mean       sd  n
## 1 2016-01-01 23.85714 3.387653  7
## 2 2016-02-01 24.45455 6.137811 11
## 3 2016-03-01 22.45455 6.638729 11
## 4 2016-04-01 22.66667 3.041381  9
## 5 2016-05-01 22.50000 3.891382  8
## 6 2016-06-01 22.00000 6.204837  5

See section Aggregating data frames in Appendix D for details on the split-apply-combine strategy.

Next, we calculate the centre line and the control limits using the formula in Table 6.1 where \(\bar{\bar{x}}\) (pronounced x bar bar) is the weighted mean of the subgroup means, \(\bar{s}\) (s bar) is the weighted mean of the subgroup standard deviations, and \(A_3\) is a constant that depends on the subgroup size. See the section on chart constants at the end of this chapter for the R code involved in calculating \(A_3\) and other constants for control chart construction.

With the aggregated data we are now able to construct the X-bar chart (Figure 6.8).

with(csect.agg, {
  xbarbar <- weighted.mean(mean, n)  # centre line
  sbar    <- weighted.mean(sd, n)    # process standard deviation
  a3      <- a3(n)                   # A3 constant
  
  spc(x   = month,
      y   = mean,
      cl  = xbarbar,
      lcl = xbarbar - a3 * sbar,
      ucl = xbarbar + a3 * sbar)
})

Figure 6.8: X bar chart

Figure 6.8 shows the average delay time per month. On average the delay time is 23 minutes (= centre line) and all data points fall between the control limits suggesting that the process is stable and predictable.

As with U and P charts the control limits vary from month to month reflecting the varying subgroup sizes – small subgroups \(\rightarrow\) wide limits; large subgroups \(\rightarrow\) narrow limits.

Be careful not to fall for the temptation to conclude that just because no months are above the target of 30 minutes all is well. The 30-minute target applies to the delay time of individual sections, not the averages. Even if the averages are well below the target, individuals may be above, which we already noticed from the I chart above.

6.4.4 S chart

The S chart is usually plotted alongside the X-bar chart and shows the within subgroup variation. It is useful for detecting changes in the spread of data over time.

To calculate the centre and control limits for the S chart we need to know the process standard deviation, \(\bar{S}\) (same as for the X-bar chart), and the two constants \(B_3\) and \(B_4\) From Table 6.1.

with(csect.agg, {
  sbar    <- weighted.mean(sd, n)  # pooled SD, centre line
  b3      <- b3(n)                 # B3 constant
  b4      <- b4(n)                 # B4 constant
  
  spc(x   = month,
      y   = sd,
      cl  = sbar,
      lcl = b3 * sbar,
      ucl = b4 * sbar)
})

Figure 6.9: S chart

Figure 6.9 shows the average standard deviation of delay times per month. On average the standard deviation is 4.7 minutes and all data points fall between the control limits.

We may plot the X-bar and S charts together (Figure 6.10):

with(csect.agg, {
  xbarbar <- weighted.mean(mean, n)  # pooled average
  sbar    <- weighted.mean(sd, n)    # pooled standard deviation
  a3      <- a3(n)                   # A3 constant
  b3      <- b3(n)                   # B3 constant
  b4      <- b4(n)                   # B4 constant
  
  op <- par(mfrow = c(2, 1),
            mar   = c(3, 5, 2, 1))
  spc(month, mean,
      cl   = xbarbar,
      lcl  = xbarbar - a3 * sbar,
      ucl  = xbarbar + a3 * sbar,
      main = 'X-bar Chart',
      xlab = '')
  
  spc(month, sd,
      cl   = sbar,
      lcl  = b3 * sbar,
      ucl  = b4 * sbar,
      main = 'S chart',
      xlab = '')
  par(op)
})

Figure 6.10: X-bar and S charts

6.5 Control limits in short

Control limits attempt to estimate the boundaries of the natural common cause process variation. They are placed 3 standard deviations above and below the centre line, which is the (weighted) mean of the subgroup means.

The procedure for calculating control limits depends on the type of data involved, but the interpretation of charts are the same regardless of data type.

In the next chapter we will improve our plots by adding visual clues to highlight signs of non-random variation.

Control chart constants

a3 <- function(n) {
  3 / (c4(n) * sqrt(n))
}

b3 <- function(n) {
  pmax(0, 1 - 3 * c5(n) / c4(n))
}

b4 <- function(n) {
  1 + 3 * c5(n) / c4(n)
}

c4 <- function(n) {
  n[n <= 1] <- NA
  sqrt(2 / (n - 1)) * exp(lgamma(n / 2) - lgamma((n - 1) / 2))
}

c5 <- function(n) {
  sqrt(1 - c4(n)^2)
}