1
The Hierarchical Model
The MAP (Meta-Analytic Predictive) prior framework treats each historical study as an exchangeable draw from a common distribution. The model is:
yi | θi ~ N(θi, σi² / ni)
θi | μ, τ ~ N(μ, τ²)
Here, yi is the observed mean in study i, θi is the true study-specific mean, μ is the grand mean, and τ is the between-study standard deviation (heterogeneity). The key insight is that τ controls how much information flows from historical studies to the new study:
- When τ = 0: all studies are identical, full borrowing
- When τ is large: studies differ substantially, minimal borrowing
This automatic adjustment is the core advantage over ad-hoc methods. The model borrows exactly as much as the data warrant.
2
Heterogeneity Estimation
This tool uses the DerSimonian-Laird (DL) estimator for τ², the standard method-of-moments estimator from meta-analysis. Given k studies with inverse-variance weights wi = ni / si²:
Q = Σ wi(yi - ŷ)2
τ²DL = max(0, (Q - (k-1)) / C)
where C = Σwi - Σwi² / Σwi and ŷ is the weighted mean. The I² statistic expresses heterogeneity as a percentage: I² = max(0, (Q - (k-1))/Q) × 100%.
Limitation: DL can underestimate τ when k is small (< 10 studies). For regulatory submissions, consider the full Bayesian estimation, which places a proper prior on τ and provides credible intervals.
3
Effective Sample Size (ESS)
The effective sample size quantifies how many concurrent control animals the historical data are equivalent to. Under the normal-normal hierarchical model:
ESS = σ² / (σ² / Nhist + τ²)
This formula captures the discount: as τ increases, the denominator grows and ESS shrinks. When τ = 0, ESS = Nhist (full information). The formula comes from the predictive variance of a new study mean under the hierarchical model (Neuenschwander et al., 2010).
4
Robustification
Following Schmidli et al. (2014), the MAP prior is mixed with a vague (uninformative) component:
πrobust = (1 - w) · πMAP + w · πvague
The default weight w = 0.2 means 20% of the prior mass comes from a vague component that is essentially uninformative. This protects against prior-data conflict — the scenario where historical controls differ from the current study more than the heterogeneity model predicts. The practical effect is a further reduction of ESS:
ESSrobust = (1 - w) × ESS
5
Sample Size Calculation
The ESS is independent of the experimental design. It quantifies the information content of historical controls regardless of whether the new study is a t-test, ANOVA, or Dunnett design. The design only determines the classical ncontrol that the ESS is subtracted from.
Two-group (t-test)
n = ⌈ 2 · ((zα/2 + zβ) · σ / δ)² ⌉
Multi-group (Dunnett: 1 control vs. k treatments)
For k many-to-one comparisons, the per-comparison α is Bonferroni-corrected to α/k.
The optimal control group allocation is ncontrol = ntreatment × √k (Dunnett, 1955).
ntreatment = ⌈ ((zα/(2k) + zβ) / d)² ⌉
ncontrol = ⌈ ntreatment × √k ⌉
Custom / other designs
For ANOVA, factorial, dose-response, or other designs: compute ncontrol with your preferred tool (e.g. G*Power) and enter it directly in "Custom" mode. The ESS reduction applies to the control group only.
Reduction formula (all designs)
nconcurrent = max(ncontrol,classic - ESSrobust, nmin)
Treatment group sizes are never reduced — historical data only inform the control condition. The floor nmin (default: 5) ensures that a concurrent control group is always present for:
- Prior-data conflict detection (do current controls match historical?)
- Time-effect monitoring (has something changed since the last study?)
- Regulatory compliance (BfR: concurrent controls are essential)
6
Why Naive Pooling Fails
"Naive pooling" means treating all historical control animals as if they were concurrent — simply adding them to the current control group. This ignores the between-study variance τ². The consequence:
SEnaive = σ / √Nhist vs.
SEtrue = √(σ²/Nhist + τ²)
The naive SE is always smaller than the true SE (when τ > 0), leading to falsely narrow confidence intervals and an inflated type I error rate. Pocock (1976) demonstrated this in clinical trials; Sacks et al. (1982) showed that historically controlled studies systematically overestimate treatment effects.
Example: With τ = 5 and 89 historical animals, naive pooling would give SE = 1.59, but the true SE is 5.13 — a 3.2× underestimation. The nominal α = 5% becomes an actual α of ~33%. One in three experiments would show a "significant" result by chance alone.
7
Limitations & Assumptions
- Group comparisons only. This tool handles the most common case: comparing group means (control vs. treatment) at a single time point. More complex designs — longitudinal trajectories, factorial designs, survival endpoints, or multivariate outcomes — require extensions of the MAP prior framework that go beyond this calculator.
- Normal endpoints only. The ESS formula assumes normally distributed data. For binary outcomes (e.g., tumor incidence), a beta-binomial variant is needed but is not supported by this calculator.
- DerSimonian-Laird estimator. The between-study heterogeneity τ is estimated using the DL method-of-moments estimator, which is known to underestimate τ when few studies are available (k < 10; Sidik & Jonkman, 2007). With k = 3–5 studies, τ may be underestimated by 20–50%. For regulatory submissions, full Bayesian estimation with proper priors on τ is preferable.
- ESS is an approximation. The ESS formula assumes a single global σ (user-specified) and posterior normality. It does not account for heterogeneous within-study variances across historical studies. For planning purposes this is adequate; for regulatory claims, validate against the full MAP prior ESS from Bayesian computation.
- Robustification is a heuristic. The robust ESS is computed as (1 − w) × ESS, a linear scaling that approximates the effect of a prior mixture (Schmidli et al., 2014). This is a first-order approximation, not the exact posterior variance under the mixture.
- Path A MDE tolerance is configurable. Path A (frequentist) reduces the control group by accepting a slightly larger minimum detectable effect. The tolerance (default: 15%) is user-configurable. The power for the original effect size is reduced accordingly and is reported transparently. This approach is not derived from a formal optimality criterion; it is a practical heuristic.
- Stability zone thresholds are approximate. The classification into stable (τ/σ < 15%), moderate (15–40%), and unstable (≥ 40%) is informed by type I error analysis and corresponds approximately to established I² thresholds (Higgins et al., 2003). These are decision aids, not sharp statistical boundaries.
- Dunnett design uses Bonferroni correction. The multi-group (Dunnett) sample size calculation uses the Bonferroni-corrected α/k, which is conservative compared to the exact Dunnett critical values. Sample sizes may be 5–10% larger than necessary.
- No time-trend modeling. The model assumes exchangeability across studies. If there is a known temporal drift (e.g., genetic drift in a colony), the model may be misspecified.
- Single endpoint. Each analysis applies to one endpoint. If your study has multiple primary endpoints, compute ESS for each separately, using the endpoint with the highest between-study variation for the most conservative planning.
- Independence. Historical studies are assumed independent. If they share animals (e.g., repeated measures on the same cohort), the effective k is reduced.
8
Software & Validation
This tool implements the analytical ESS approximation in JavaScript for browser-based computation. For regulatory submissions or publications, we recommend validating results against a full Bayesian implementation performed by a statistician.
9
References
-
BfR (2026).
Verwendung historischer Kontrollen. Empfehlung 013/2026 des Nationalen Ausschusses.
Bundesinstitut für Risikobewertung, 20 March 2026. DOI: 10.17590/20260320-131048-0
-
Bonapersona V, Hoijtink H, Consortium R, Sarabdjitsingh RA, Joels M (2021).
Increasing the statistical power of animal experiments with historical control data.
Nature Neuroscience, 24(4), 470-477. DOI: 10.1038/s41593-020-00792-3
-
Chen M-H, Ibrahim JG (2000).
Power prior distributions for regression models.
Statistical Science, 15(1). DOI: 10.1214/ss/1009212673
-
Coja T et al. (2025).
Use and reporting of historical control data for regulatory studies.
EFSA Journal, 23(8). DOI: 10.2903/j.efsa.2025.9576
-
DerSimonian R, Laird N (1986).
Meta-analysis in clinical trials.
Controlled Clinical Trials, 7(3), 177-188. DOI: 10.1016/0197-2456(86)90046-2
-
Kramer M, Font E (2017).
Reducing sample size in experiments with animals: historical controls and related strategies.
Biological Reviews, 92(1), 431-445. DOI: 10.1111/brv.12237
-
Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter DJ (2010).
Summarizing historical information on controls in clinical trials.
Clinical Trials, 7(1), 5-18. DOI: 10.1177/1740774509356002
-
Pocock SJ (1976).
The combination of randomized and historical controls in clinical trials.
Journal of Chronic Diseases, 29(3), 175-188. DOI: 10.1016/0021-9681(76)90044-8
-
Sacks H, Chalmers TC, Smith H Jr (1982).
Randomized versus historical controls for clinical trials.
American Journal of Medicine, 72(2), 233-240. DOI: 10.1016/0002-9343(82)90815-4
-
Schmidli H, Gsteiger S, Roychoudhury S, O'Hagan A, Spiegelhalter D, Neuenschwander B (2014).
Robust meta-analytic-predictive priors in clinical trials with historical control information.
Biometrics, 70(4), 1023-1032. DOI: 10.1111/biom.12242
-
Viele K, Berry S, Neuenschwander B et al. (2014).
Use of historical control data for assessing treatment effects in clinical trials.
Pharmaceutical Statistics, 13(1), 41-54. DOI: 10.1002/pst.1589