C*Power — Methods & References

The Hierarchical Model

The MAP (Meta-Analytic Predictive) prior framework treats each historical study as an exchangeable draw from a common distribution. The model is:

y_i | θ_i ~ N(θ_i, σ_i² / n_i)
θ_i | μ, τ ~ N(μ, τ²)

Here, y_i is the observed mean in study i, θ_i is the true study-specific mean, μ is the grand mean, and τ is the between-study standard deviation (heterogeneity). The key insight is that τ controls how much information flows from historical studies to the new study:

When τ = 0: all studies are identical, full borrowing
When τ is large: studies differ substantially, minimal borrowing

This automatic adjustment is the core advantage over ad-hoc methods. The model borrows exactly as much as the data warrant.

Heterogeneity Estimation

This tool uses the DerSimonian-Laird (DL) estimator for τ², the standard method-of-moments estimator from meta-analysis. Given k studies with inverse-variance weights w_i = n_i / s_i²:

Q = Σ w_i(y_i - ŷ)² τ²_DL = max(0, (Q - (k-1)) / C)

where C = Σw_i - Σw_i² / Σw_i and ŷ is the weighted mean. The I² statistic expresses heterogeneity as a percentage: I² = max(0, (Q - (k-1))/Q) × 100%.

Limitation: DL can underestimate τ when k is small (< 10 studies). For regulatory submissions, consider the full Bayesian estimation, which places a proper prior on τ and provides credible intervals.

Effective Sample Size (ESS)

The effective sample size quantifies how many concurrent control animals the historical data are equivalent to. Under the normal-normal hierarchical model:

ESS = σ² / (σ² / N_hist + τ²)

This formula captures the discount: as τ increases, the denominator grows and ESS shrinks. When τ = 0, ESS = N_hist (full information). The formula comes from the predictive variance of a new study mean under the hierarchical model (Neuenschwander et al., 2010).

Robustification

Following Schmidli et al. (2014), the MAP prior is mixed with a vague (uninformative) component:

π_robust = (1 - w) · π_MAP + w · π_vague

The default weight w = 0.2 means 20% of the prior mass comes from a vague component that is essentially uninformative. This protects against prior-data conflict — the scenario where historical controls differ from the current study more than the heterogeneity model predicts. The practical effect is a further reduction of ESS:

ESS_robust = (1 - w) × ESS

Sample Size Calculation

The ESS is independent of the experimental design. It quantifies the information content of historical controls regardless of whether the new study is a t-test, ANOVA, or Dunnett design. The design only determines the classical n_control that the ESS is subtracted from.

Two-group (t-test)

n = ⌈ 2 · ((z_α/2 + z_β) · σ / δ)² ⌉

Multi-group (Dunnett: 1 control vs. k treatments)

For k many-to-one comparisons, the per-comparison α is Bonferroni-corrected to α/k. The optimal control group allocation is n_control = n_treatment × √k (Dunnett, 1955).

n_treatment = ⌈ ((z_α/(2k) + z_β) / d)² ⌉ n_control = ⌈ n_treatment × √k ⌉

Custom / other designs

For ANOVA, factorial, dose-response, or other designs: compute n_control with your preferred tool (e.g. G*Power) and enter it directly in "Custom" mode. The ESS reduction applies to the control group only.

Reduction formula (all designs)

n_concurrent = max(n_{control,classic} - ESS_robust, n_min)

Treatment group sizes are never reduced — historical data only inform the control condition. The floor n_min (default: 5) ensures that a concurrent control group is always present for:

Prior-data conflict detection (do current controls match historical?)
Time-effect monitoring (has something changed since the last study?)
Regulatory compliance (BfR: concurrent controls are essential)

Why Naive Pooling Fails

"Naive pooling" means treating all historical control animals as if they were concurrent — simply adding them to the current control group. This ignores the between-study variance τ². The consequence:

SE_naive = σ / √N_hist vs. SE_true = √(σ²/N_hist + τ²)

The naive SE is always smaller than the true SE (when τ > 0), leading to falsely narrow confidence intervals and an inflated type I error rate. Pocock (1976) demonstrated this in clinical trials; Sacks et al. (1982) showed that historically controlled studies systematically overestimate treatment effects.

Example: With τ = 5 and 89 historical animals, naive pooling would give SE = 1.59, but the true SE is 5.13 — a 3.2× underestimation. The nominal α = 5% becomes an actual α of ~33%. One in three experiments would show a "significant" result by chance alone.

Limitations & Assumptions

Group comparisons only. This tool handles the most common case: comparing group means (control vs. treatment) at a single time point. More complex designs — longitudinal trajectories, factorial designs, survival endpoints, or multivariate outcomes — require extensions of the MAP prior framework that go beyond this calculator.
Normal endpoints only. The ESS formula assumes normally distributed data. For binary outcomes (e.g., tumor incidence), a beta-binomial variant is needed but is not supported by this calculator.
DerSimonian-Laird estimator. The between-study heterogeneity τ is estimated using the DL method-of-moments estimator, which is known to underestimate τ when few studies are available (k < 10; Sidik & Jonkman, 2007). With k = 3–5 studies, τ may be underestimated by 20–50%. For regulatory submissions, full Bayesian estimation with proper priors on τ is preferable.
ESS is an approximation. The ESS formula assumes a single global σ (user-specified) and posterior normality. It does not account for heterogeneous within-study variances across historical studies. For planning purposes this is adequate; for regulatory claims, validate against the full MAP prior ESS from Bayesian computation.
Robustification is a heuristic. The robust ESS is computed as (1 − w) × ESS, a linear scaling that approximates the effect of a prior mixture (Schmidli et al., 2014). This is a first-order approximation, not the exact posterior variance under the mixture.
Path A MDE tolerance is configurable. Path A (frequentist) reduces the control group by accepting a slightly larger minimum detectable effect. The tolerance (default: 15%) is user-configurable. The power for the original effect size is reduced accordingly and is reported transparently. This approach is not derived from a formal optimality criterion; it is a practical heuristic.
Stability zone thresholds are approximate. The classification into stable (τ/σ < 15%), moderate (15–40%), and unstable (≥ 40%) is informed by type I error analysis and corresponds approximately to established I² thresholds (Higgins et al., 2003). These are decision aids, not sharp statistical boundaries.
Dunnett design uses Bonferroni correction. The multi-group (Dunnett) sample size calculation uses the Bonferroni-corrected α/k, which is conservative compared to the exact Dunnett critical values. Sample sizes may be 5–10% larger than necessary.
No time-trend modeling. The model assumes exchangeability across studies. If there is a known temporal drift (e.g., genetic drift in a colony), the model may be misspecified.
Single endpoint. Each analysis applies to one endpoint. If your study has multiple primary endpoints, compute ESS for each separately, using the endpoint with the highest between-study variation for the most conservative planning.
Independence. Historical studies are assumed independent. If they share animals (e.g., repeated measures on the same cohort), the effective k is reduced.

Software & Validation

This tool implements the analytical ESS approximation in JavaScript for browser-based computation. For regulatory submissions or publications, we recommend validating results against a full Bayesian implementation performed by a statistician.

References

BfR (2026). Verwendung historischer Kontrollen. Empfehlung 013/2026 des Nationalen Ausschusses. Bundesinstitut für Risikobewertung, 20 March 2026. DOI: 10.17590/20260320-131048-0
Bonapersona V, Hoijtink H, Consortium R, Sarabdjitsingh RA, Joels M (2021). Increasing the statistical power of animal experiments with historical control data. Nature Neuroscience, 24(4), 470-477. DOI: 10.1038/s41593-020-00792-3
Chen M-H, Ibrahim JG (2000). Power prior distributions for regression models. Statistical Science, 15(1). DOI: 10.1214/ss/1009212673
Coja T et al. (2025). Use and reporting of historical control data for regulatory studies. EFSA Journal, 23(8). DOI: 10.2903/j.efsa.2025.9576
DerSimonian R, Laird N (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177-188. DOI: 10.1016/0197-2456(86)90046-2
Kramer M, Font E (2017). Reducing sample size in experiments with animals: historical controls and related strategies. Biological Reviews, 92(1), 431-445. DOI: 10.1111/brv.12237
Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter DJ (2010). Summarizing historical information on controls in clinical trials. Clinical Trials, 7(1), 5-18. DOI: 10.1177/1740774509356002
Pocock SJ (1976). The combination of randomized and historical controls in clinical trials. Journal of Chronic Diseases, 29(3), 175-188. DOI: 10.1016/0021-9681(76)90044-8
Sacks H, Chalmers TC, Smith H Jr (1982). Randomized versus historical controls for clinical trials. American Journal of Medicine, 72(2), 233-240. DOI: 10.1016/0002-9343(82)90815-4
Schmidli H, Gsteiger S, Roychoudhury S, O'Hagan A, Spiegelhalter D, Neuenschwander B (2014). Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics, 70(4), 1023-1032. DOI: 10.1111/biom.12242
Viele K, Berry S, Neuenschwander B et al. (2014). Use of historical control data for assessing treatment effects in clinical trials. Pharmaceutical Statistics, 13(1), 41-54. DOI: 10.1002/pst.1589