Mean, Median, Mode, & CIs

Statistical analysis is the backbone of Quality Control (QC) in the Hematology laboratory. While the laboratory scientist does not need to be a statistician, they must possess a working knowledge of descriptive statistics to interpret QC charts, establish reference ranges, and determine if an instrument is performing within acceptable limits. These parameters define the “Central Tendency” (where the data clusters) and the “Dispersion” (how spread out the data is)

Measures of Central Tendency

These values attempt to identify the “center” of a data set, representing the target value for control materials or the typical value for a patient population

Mean (\(\bar{x}\))

The mean is the arithmetic average of a data set. In the laboratory, the mean of a Quality Control material represents the Target Value. It is the single most important statistic for assessing Accuracy (how close the result is to the true value)

  • Formula: Sum of all values (\(\sum x\)) divided by the number of values (\(n\)) \[ \bar{x} = \frac{\sum x}{n} \]
  • Application: When setting up a new lot of Hematology control blood, the lab runs the control 20 times. The average of these 20 runs becomes the Mean for the Levey-Jennings chart. Any deviation from this mean suggests systematic error (bias)
  • Sensitivity: The mean is heavily influenced by outliers. One extremely high result (e.g., due to a clot) will pull the mean significantly upward

Median

The median is the middle value when the data points are arranged in numerical order. It divides the distribution into two equal halves (50% above, 50% below)

  • Application: While less used in daily QC, the median is crucial for establishing Reference Intervals (Normal Ranges) for patient populations that are not normally distributed (skewed). For example, Ferritin levels in the general population are often skewed; the median provides a better representation of the “typical” patient than the mean

Mode

The mode is the value that appears most frequently in the data set

  • Application: In Hematology, the mode is visually represented on the Histogram. For example, the RBC Histogram shows a bell curve of cell sizes. The peak of this curve - the size that the majority of RBCs possess - is the Mode
  • MCV vs. Mode: Interestingly, the MCV (Mean Corpuscular Volume) reported by the analyzer is the mean size, but the peak of the curve is the mode. In a perfectly symmetrical (Gaussian) distribution, the Mean, Median, and Mode are identical

Measures of Dispersion & Confidence Intervals

Knowing the mean is not enough; the lab must know how much “scatter” or variation is considered normal. This is defined by the Standard Deviation and Confidence Intervals

Standard Deviation (\(s\) or SD)

The Standard Deviation quantifies the amount of variation or dispersion of a set of data values. In the laboratory, SD is the measure of Precision (Reproducibility). A low SD indicates that the instrument produces the exact same result every time it runs the same sample

  • The Concept: It measures the average distance of each data point from the mean
  • Calculation (Sample SD) \[ s = \sqrt{\frac{\sum(x - \bar{x})^2}{n - 1}} \]
    • Ideally, the SD for a hematology parameter should be very small compared to the mean

Coefficient of Variation (CV)

Because SD is expressed in the units of the analyte (e.g., g/dL for Hgb), it is difficult to compare the precision of two different tests (e.g., Platelets vs. Hgb). The CV converts the SD into a percentage, standardizing it

  • Formula \[ CV\% = \frac{SD}{\text{Mean}} \times 100 \]
  • Benchmark: In Hematology, Hgb and RBC counts are very precise (\(CV < 1-2\%\)). Platelet counts are less precise (\(CV < 5-10\%\)) due to the difficulty of counting smaller, less numerous particles

Gaussian Distribution & Confidence Intervals

Laboratory statistics rely on the assumption that repeated measurements of a control (or biological data of a population) follow a Gaussian (Normal) Distribution, forming a bell-shaped curve. This allows us to predict the probability of a result falling within a certain range

  • The 68-95-99 Rule
    • \(\pm\) 1 SD: Contains approximately 68% of the data
    • \(\pm\) 2 SD: Contains approximately 95.5% of the data. This is the standard Confidence Interval used for laboratory QC
    • \(\pm\) 3 SD: Contains approximately 99.7% of the data
  • Confidence Interval (CI): This is the range of values within which we are “confident” the true value lies. In QC, the “Acceptable Range” for a control is typically defined as the Mean \(\pm\) 2 SD
    • Calculation: If the Mean WBC control is 5.0 and the SD is 0.2:
      • 2 SD = \(0.2 \times 2 = 0.4\)
      • Range = \(5.0 \pm 0.4\)
      • Acceptable Limits = 4.6 to 5.4
    • Interpretation: If a daily control run yields a result of 5.3, it is “in control” (within the 95% confidence interval). If it yields 5.5, it is “out of control” (outside the 2 SD limit), likely indicating error since there is less than a 5% chance this would occur naturally

Reference Intervals (Normal Ranges)

The confidence interval concept is also applied to patient populations to determine the “Normal Range” printed on the lab report

  • Method: The lab tests 120+ healthy individuals. They calculate the Mean and SD of this healthy group
  • Definition: The Reference Interval is typically defined as Mean \(\pm\) 2 SD of the healthy population
  • Implication: By definition, this range includes only 95% of healthy people. This means 5% of perfectly healthy people will fall outside the normal range (2.5% slightly high, 2.5% slightly low) purely by statistical chance