Chapter 10 Introducing qicharts2
qicharts2 (Quality Improvement Charts, Anhøj (2024), Anhøj (2018)) is an R package for SPC aimed at healthcare data analysts. It is based on the same principles that we have developed in the previous chapters of this book. It contains functions to construct run charts and all of the Magnificent Seven plus a number of specialised charts including pareto charts and control charts for rare events data.
To learn everything about qicharts2, visit its website: https://anhoej.github.io/qicharts2/. In this chapter we will concentrate on some key facilities that is missing from the function library we have build so far:
- excluding data points from analysis
- freezing and splitting charts by periods
- multivariate plots (small multiples)
To get started with qicharts2, install it and load it into your working environment:
Remember, you only need to install a package once, but you need to load it every time you want to use it.
Next, you may want to read the vignette: vignette('qicharts2')
– or you may want to get started right away.
10.1 A simple run chart
The main function of qicharts2 is qic()
. It takes the same arguments as the spc()
function we built previously plus many more. Check the documentation for a complete list of arguments: ?qic
.
To reproduce our first run chart from Chapter 5, run:
Figure 10.1: Run chart produced with the qic()
function from the qicharts2 package
There are several things to note in Figure 10.1:
Default chart title and axis labels are created automatically. These can be changed using the title, ylab, and xlab arguments.
Data points that fall directly on the centre line are greyed out. These do not count as useful observation in the runs analysis.
The centre line value is printed on the chart.
10.2 A simple control chart
To produce a control chart, we simply add a chart argument (Figure 10.2):
Figure 10.2: I chart produced with qic()
By default qic()
uses a grey background area to show the natural process limits.
10.3 Excluding data points from analysis
Sometimes it is useful to exclude one or more data points from the calculation of control limits and from the runs analysis. This may be the case when specific data points are known to have been influenced by factors that are not part of the natural process, for example data points that fall outside the control limits. We use the exclude argument to do this:
Figure 10.3: I chart with one data point excluded from calculations
Notice how the values for the centre line and control limits in Figure 10.3 changed a little. Specifically, the control limits became slightly narrower.
Excluding data points should be a deliberate decision and not something that is done automatically just because one or more data points are outside the control limits. Exclusion should be based on a thorough understanding of the process and only when the reason(s) for a special cause has been established. Otherwise, the whole idea of SPC as a way of understanding variation and its sources is lost.
10.4 Freezing baseline period
When data have been collected for a long enough period of time to establish the centre line and control limits of a stable and predictable process – that is, a process with only common cause variation – it is often useful to “freeze” the limits and use them for future data.
In production industry this technique is referred to as phase one and phase two studies. In healthcare freezing is especially useful when we have historical data from before the start of some type of intervention or improvement programme. When plotting future data with the centre line and control limits from a stable baseline period shifts and trends will show up faster than if the limits were recalculated with every new data point.
The cdi dataset comes with qicharts2 and contains monthly counts of infections two years before and one year after the initiation of an improvement programme. Check the documentation for details, ?cdi
.
Figure 10.4: Infections before and after intervention
Figure 10.4 shows a run chart of data where the centre line has been established from the baseline period (month 1-24) using the freeze argument. After the intervention, marked by the dotted vertical line, there is a sustained shift in data in the desired direction.
10.5 Splitting chart by period
When a sustained shift in data has been discovered and the cause is known it is allowable to split the graph by periods. We use the part argument for this (Figure 10.5).
Figure 10.5: Splitting using index
The part argument takes either the indices of the data points to split after, or a categorical variable naming the time periods in which case the periods will be labelled (Figure 10.6).
Figure 10.6: Splitting using a period variable
As with excluding data points, splitting should be based on deliberate decisions and thorough understanding of the process. Some SPC applications allow for automatic splitting whenever a shift is detected. We strongly advise against this approach. Splitting may be useful when:
- there is a sustained shift in data
- the reason for the shift is known
- the shift is in the desired direction
- the shift is expected to continue
If any of these conditions is not met we should rather look for the root cause and – if need be – eliminate it.
Because we now have data from two very different but stable processes we may choose to add control limits to better detect future changes, especially freaks, in process behaviour (Figure 10.7).
Figure 10.7: Splitting using a period variable
10.6 Small multiple plots for multivariate data
Process data is all about time. But often data – not the least in healthcare – have more dimensions, which are important to understand in order to interpret data correctly.
For example, the hospital_infections dataset, which is also included in the qicharts2 package, contains monthly counts of three types of hospital infections: bacteremia (BAC), C. diff. (CDI), and urinary tract infections (UTI) from six hospitals: AHH, BFH, BOH, HGH, NOH, RGH.
## hospital infection month n days
## 1 AHH BAC 2015-01-01 17 17233.67
## 2 AHH BAC 2015-02-01 18 15308.25
## 3 AHH BAC 2015-03-01 17 16883.67
## 4 AHH BAC 2015-04-01 10 15463.83
## 5 AHH BAC 2015-05-01 13 15788.96
## 6 AHH BAC 2015-06-01 14 15660.04
In addition to the time dimension, these data have two extra dimensions, infection and hospital.
Figure 10.8 shows the total (aggregated) monthly counts of hospital associated urinary tract infections from six hospitals.
qic(month, n, days,
data = subset(hospital_infections,
infection == 'UTI'),
chart = 'u',
multiply = 10000,
title = 'Urinary tract infections',
ylab = 'Count per 10,000 risk days',
xlab = 'Month')
Figure 10.8: Aggregated U chart of urinary tract infections from six hospital
Figure 10.9 shows the same data as Figure 10.8, but this time data have been stratified into so-called small multiple plots – one plot per hospital.
qic(month, n, days,
data = subset(hospital_infections,
infection == 'UTI'),
chart = 'u',
multiply = 10000,
facets = ~hospital, # stratify by hospital
ncol = 2, # two-column arrangement of plots
title = 'Urinary tract infections',
ylab = 'Count per 10,000 risk days',
xlab = 'Month')
Figure 10.9: Stratified (small multiple) U charts of urinary tract infections from six hospital
In the litterature, small multiples are also known as trellis, lattice, or grid plots.
Likewise, we may construct a small multiple plot of different infection types from one hospital (Figure 10.10).
qic(month, n, days,
data = subset(hospital_infections,
hospital == 'NOH'),
chart = 'u',
multiply = 10000,
facets = ~infection, # stratify by infection type
ncol = 1,
title = 'Hospital infections',
ylab = 'Count per 10,000 risk days',
xlab = 'Month')
Figure 10.10: U charts from one hospital stratified by infection type
By default, qic()
uses fixed axis scales. This makes it easy to compare both the indicator levels (y-axis) and the patterns of variation over time (x-axis) between facets. However, sometimes it makes little sense to compare levels from very different indicators as in this example where different types of infections occur at very different rates. Because urinary tract infections is much more frequent that the other two types of infection, it is hard to interpret the patterns over time from these. In Figure 10.11 we use individual y-axes.
qic(month, n, days,
data = subset(hospital_infections, hospital == 'NOH'),
chart = 'u',
multiply = 10000,
facets = ~infection,
scales = 'free_y', # free y-axes
ncol = 1,
title = 'Hospital infections',
ylab = 'Count per 10,000 risk days',
xlab = 'Month')
Figure 10.11: Small multiple plot with individual y-axes
The decision about when to use fixed or free axis scales should be taken deliberately depending on the purpose of the plot. Sometimes it may even be useful to display both plots with fixed and free axes side by side.
Finally, we may want to display both dimensions, infection and hospital, at the same time as in Figure 10.12.
qic(month, n, days,
data = hospital_infections,
chart = 'u',
multiply = 10000,
facets = infection ~ hospital, # two-dimensional faceting
scales = 'free_y',
title = 'Hospital infections',
ylab = 'Count per 10,000 risk days',
xlab = 'Month')
Figure 10.12: Hospital infections stratified by infection and hospital
10.6.1 To aggregate or not to aggregate
When data come from multiple organisational units, for example hospitals or hospital departments, we must decide if we want to aggregate or stratify data.
Figures 10.8 and 10.9 demonstrate the difference. Notice how the aggregated data show no signs of special causes while the stratified data show shifts in two hospitals (BOH and NOH). If data, as in this case, move in different directions in two or more units, aggregation has a tendency to mask or cancel out special causes. However, when minor shifts in data, that are too small to produce signals in the small multiple plots, move in the same direction the shifts tend to add up resulting in a better chance of detecting the special cause with aggregated data.
As usual, the decision to aggregate or stratify must be taken deliberately by persons with a thorough understanding of the structures and processes that have produced data. And often it may be useful to produce several plots at different levels of aggregation.
10.7 qicharts2 in short
qicharts2 is an R package for constructing SPC charts aimed specifically at healthcare but usefull in many other areas. Besides functions for the most commonly used SPC charts, qicharts2 has a number of specialist charts for rare events data and count data with very large denominators.