Chapter 19 The Forgotten Art of Rational Subgrouping

A subgroup consists of the data elements that make up a single data point, for example, the waiting times used to calculate the average waiting time for a particular period.

Rational subgrouping is one of the most important – and most frequently misunderstood – aspects of statistical process control. Poor subgrouping is a common reason why SPC charts fail in practice.

Rational subgrouping is the intentional and intelligent sampling and grouping of data into data points for SPC charts, with the aim of maximising the chances of detecting special cause variation while minimising the risk of false alarms. In other words, rational subgrouping is about maximising the signal-to-noise ratio.

A rational subgroup consists of a set of measurements or counts that are:

  • produced under conditions that are as similar as possible,
  • taken close together in time and space, and
  • likely to show only common cause variation.

The underlying logic is simple but powerful: when subgroups are rationally formed, variation within subgroups reflects common cause variation, while variation between subgroups that exceeds what is expected from within-subgroup variation indicates the presence of special causes.

In our experience – particularly in healthcare – rational subgrouping is something of a forgotten art. Too often, we rely on whatever data are readily available, typically pre-aggregated into monthly, quarterly, or yearly summaries for administrative purposes rather than for improvement. This often leads to suboptimal charts that obscure meaningful signals.

19.1 Too large subgroups – masking meaningful signals

A common mistake is to form subgroups across overly broad spans of time or space (e.g. large organisational units). This blends common and special cause variation and effectively hides the latter.

As an example, figures 19.1 and 19.2 display the number of C. diff. infections subgrouped by monthly and two-monthly periods respectively.

qic(month, infections, 
    data  = cdiff, 
    chart = 'c',
    title = 'C. diff. infections',
    ylab  = 'Count',
    xlab  = 'Months')
Monthly C. diff. infections.

Figure 19.1: Monthly C. diff. infections.

qic(month, infections, 
    data     = cdiff, 
    chart    = 'c',
    x.period = '2 months',
    title    = 'C. diff. infections',
    ylab     = 'Count',
    xlab     = 'Two-months')
Two-monthly C. diff. infections.

Figure 19.2: Two-monthly C. diff. infections.

By using larger subgroups, we mask two important signals – a freak value and a sustained shift – suggesting that the process is improving. While this trend may eventually become visible even with coarser data, detection is delayed, postponing learning and potential intervention.

A useful rule of thumb is: Collect data at a resolution at least as fine as the rate of change you want to detect.

High-resolution data can always be aggregated later if needed. Low-resolution data cannot be disaggregated.

Notice that the qic() function from qicharts2 includes an argument, x.period, which allows us to aggregate data into larger subgroups as demonstrated in the code producing Figure 19.2.

19.2 Too small subgroups – revealing unimportant noise

A less common – but equally important – mistake is to use subgroups that are too small. In this case, natural process variation may be misinterpreted as special cause variation.

Imagine weighing yourself three times each morning for a couple of weeks and plotting the results on an X-bar chart, using within-day variation to calculate control limits.

Figure 19.3 illustrates this with simulated data.

# lock random number generator for reproducibility
set.seed(5)

# subgroups, 12 subgroups of three values
x <- rep(1:12, each = 3)

# random values, each repeated three times
y <- rep(rnorm(12, mean = 80, sd = 0.5), each = 3)

# add a bit of random noise within subgroups
y <- jitter(y, amount = 0.2)

# plot Xbar-chart
qic(x, y, chart = 'xbar')
Xbar-chart from too small subgroups.

Figure 19.3: Xbar-chart from too small subgroups.

Here, within-subgroup variation is very small, while between-subgroup variation appears large – suggesting instability. In reality, this reflects normal day-to-day physiological variation, not a meaningful signal.

A better approach is to use an I chart of daily averages as in Figure 19.4:

qic(x, y, chart = 'i')
## Subgroup size > 1. Data have been aggregated using mean().
I-chart of average daily body weight.

Figure 19.4: I-chart of average daily body weight.

Alternatively, we could define subgroups differently – for example, weekly averages – depending on our goal.

A practical guideline:

  • Monitoring stability → larger subgroups (e.g. weekly)
  • Detecting change → smaller subgroups (e.g. daily)

19.3 Striking the balance

Rational subgrouping is both an art and a science. It presents a fundamental challenge:

To understand a process, we need rational subgroups – but to form rational subgroups, we need to understand the process.

In practice, subgrouping is often iterative. We experiment with different timeframes, organisational units, or groupings until the data begin to reflect meaningful process behaviour.

This challenge is particularly pronounced in healthcare. Unlike manufacturing, where inputs and processes can be tightly controlled, healthcare processes are inherently complex. Outcomes are influenced by patient characteristics, clinical decisions, staffing, and system-level factors.

However, limited control does not mean no control. By carefully selecting:

  • patient groups
  • clinical pathways
  • organisational units
  • time intervals

we can still design subgrouping strategies that reduce noise and improve signal detection.

19.4 A practical approach for rational subgrouping

There is no single correct method for defining rational subgroups. In practice, subgrouping is best approached as an iterative process – often aligned with Plan-Do-Study-Act cycles.

The following principles can guide your decisions:

  • Work with process experts
    Rational subgrouping is not purely statistical – it requires contextual understanding.

  • Map the process
    Identify inputs, outputs, and sources of variation.

  • Clarify your purpose
    Are you monitoring stability or trying to detect improvement?

  • Choose appropriate granularity
    Decide whether to aggregate or stratify by time, location, or population.

  • Match sampling to expected change
    Sample more frequently than the rate at which change is expected.

  • Ensure sufficient data per subgroup
    Count data: aim for ≥5 events.
    Proportion data: ensure sufficient denominator.

  • Be prepared to adapt
    Revise your approach as you learn more about the process.

These principles often involve trade-offs. For example, increasing temporal resolution may reduce subgroup size. In such cases, alternative approaches – such as different chart types (e.g. rare event charts) – may be more appropriate.

19.5 Rational subgrouping in summary

Rational subgrouping is one of the most challenging aspects of SPC – and one of the most important. There is no universal recipe. It requires judgement, experimentation, and a deepening understanding of the process over time.

But one thing is clear: Poor subgrouping produces misleading charts. Good subgrouping reveals the truth about the process.

Getting subgrouping right is not optional - it is what separates SPC charts that inform improvement from those that obscure it.