License: arXiv.org perpetual non-exclusive license
arXiv:2604.20611v1 [stat.AP] 22 Apr 2026

Bayesian Inference for Incomplete 2Γ—22\times 2 Diagnostic Tables

Sara Antonijevic, Danielle Sitalo, and Brani Vidakovic
(Department of Statistics, Texas A&M University, College Station, TX 77843, USA)
Abstract

Incomplete reporting of diagnostic accuracy data remains a persistent problem in medical research. In many studies, only part of the 2Γ—22\times 2 diagnostic table is reported, leaving denominators for diseased and non-diseased groups unknown and preventing direct calculation of sensitivity, specificity, predictive values, and related operating characteristics. To address this limitation, we develop hierarchical Bayesian models for reconstructing incomplete 2Γ—22\times 2 diagnostic tables from such partial information. Two motivating scenarios are considered: one in which only a single test-outcome row is observed, and another in which true positives, false positives, and the total sample size are reported but the remaining cells are missing. The proposed models are illustrated on a benchmark breast MRI study with complete counts, treated as partially observed in order to assess reconstruction performance under controlled missingness. The framework yields posterior inference for the missing cell counts and associated diagnostic measures, together with uncertainty quantification in weakly identified settings.

Introduction

Incomplete or partially reported diagnostic accuracy data remain a common obstacle to interpretation, reproducibility, and evidence synthesis. In many applied studies, authors report only fragments of the 2Γ—22\times 2 diagnostic table, for example a single cell count, one observed row, or a pair of summary measures, without providing the full denominators for diseased and non-diseased groups. As a result, readers may be unable to reconstruct the full table, verify internal consistency, or derive clinically relevant quantities such as predictive values, false discovery rates, or expected error counts (U.S. Food and Drug Administration, 2007; Macaskill et al., 2010).

This problem is not merely clerical. Complete cross-tabulation is central to transparent reporting of diagnostic accuracy studies, and the STARD initiative was developed precisely to improve such reporting. The STARD 2015 revision explicitly recommends reporting both the cross-tabulation of index test results against the reference standard and a participant flow diagram showing how the study denominators were obtained (Bossuyt et al., 2015; Cohen et al., 2016; EQUATOR Network, 2015). When these elements are omitted, clinically important operating characteristics may no longer be recoverable from the published record.

Empirical assessments suggest that such omissions remain common. Earlier audits showed that incomplete reporting often prevented reconstruction of the full 2Γ—22\times 2 table (Smidt et al., 2005; Wilczynski et al., 2008), and more recent work indicates that adherence to STARD recommendations remains uneven even in contemporary medical imaging diagnostic accuracy studies (White et al., 2025). For AI-centered diagnostic accuracy studies, the recent STARD-AI extension further underscores the need for clear reporting of dataset construction, model evaluation, and clinical applicability (Sounderajah et al., 2025). The practical consequence is that studies may be difficult to check, compare, or incorporate into evidence syntheses, even when the underlying clinical question is important.

A related methodological literature addresses situations in which the reference standard is applied only to a subset of participants, creating verification problems and the potential for work-up bias (de Groot et al., 2011; Buzoianu and Kadane, 2008; Umemneku Chikere et al., 2019). That literature is important for the present paper because one of our motivating examples arises from precisely such a setting. At the same time, incomplete reporting can also occur in studies where the reference standard is not obviously missing for design reasons, but the published article still omits sufficient cell counts to prevent recovery of the full diagnostic table. Our focus is on this reporting and reconstruction problem.

In practical terms, if only sensitivity and specificity are reported, a full 2Γ—22\times 2 table cannot usually be reconstructed unless additional information is available, such as the total sample size, the number of diseased subjects, or an externally justified prevalence estimate. Because many evidence-synthesis frameworks require study-level 2Γ—22\times 2 tables, missing denominators directly limit downstream meta-analytic use (Macaskill et al., 2010).

This paper develops statistical models for reconstructing incomplete 2Γ—22\times 2 diagnostic tables from partially reported data. Our motivation comes from real examples in the diagnostic accuracy literature where key cells are unobserved in the published report. We consider two representative scenarios. In the first, based on Svirsky et al. (2002), only the test-positive subgroup is reported in detail, leaving both cells of the test-negative row unobserved. This case is naturally connected to the literature on partial verification because the reference standard was applied only to a selected subgroup. In the second, based on Wismueller et al. (2020), counts for true positives and false positives are reported and the total sample size is known, but the negative-class counts are omitted. This second setting is better viewed as a constrained incomplete-table problem, since the known total sample size links the missing denominators.

Our aim is not to replace the broader verification-bias literature, nor to claim that incomplete diagnostic tables can always be uniquely recovered from sparse summaries alone. Rather, we develop Bayesian reconstruction strategies for settings in which deterministic recovery is impossible, but principled posterior inference on the missing denominators and derived operating characteristics remains feasible under clearly stated modeling assumptions.

The remainder of the paper is organized as follows. We first review notation for diagnostic 2Γ—22\times 2 tables and summarize the binomial-nn problem that underlies our reconstruction strategy. We then present the two motivating incomplete-table scenarios, develop the corresponding models, and illustrate their performance on a benchmark example with complete counts treated as partially observed.

Review and Notation for 2Γ—22\times 2 Diagnostic Accuracy Tables and the Binomial nn Problem

A 2Γ—22\times 2 diagnostic table, also called a confusion table, is the standard framework for summarizing the performance of an index test against a reference standard. It records the counts of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN), and serves as the basis for the usual measures of diagnostic performance.

Disease Present Disease Absent Total
Test Positive TP FP n+=TP+FPn_{+}=\mathrm{TP}+\mathrm{FP}
Test Negative FN TN nβˆ’=FN+TNn_{-}=\mathrm{FN}+\mathrm{TN}
Total n1=TP+FNn_{1}=\mathrm{TP}+\mathrm{FN} n2=FP+TNn_{2}=\mathrm{FP}+\mathrm{TN} NN
Table 1: Standard 2Γ—22\times 2 diagnostic table showing true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN), with row totals n+n_{+}, nβˆ’n_{-}, column totals n1n_{1}, n2n_{2}, and overall sample size NN.

From these counts one obtains the familiar diagnostic accuracy measures:

Se\displaystyle\mathrm{Se} =TPTP+FN,\displaystyle=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}, Sp\displaystyle\qquad\mathrm{Sp} =TNFP+TN,\displaystyle=\frac{\mathrm{TN}}{\mathrm{FP}+\mathrm{TN}}, (1)
PPV\displaystyle\mathrm{PPV} =TPTP+FP,\displaystyle=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}, NPV\displaystyle\qquad\mathrm{NPV} =TNFN+TN,\displaystyle=\frac{\mathrm{TN}}{\mathrm{FN}+\mathrm{TN}}, (2)
Accuracy\displaystyle\mathrm{Accuracy} =TP+TNN,\displaystyle=\frac{\mathrm{TP}+\mathrm{TN}}{N}, (3)

(Langlotz, 2003; Eusebi, 2013; Vidakovic, 2017).

We write n1=TP+FNn_{1}=\mathrm{TP}+\mathrm{FN} for the total number of diseased individuals, n2=FP+TNn_{2}=\mathrm{FP}+\mathrm{TN} for the total number of non-diseased individuals, and N=n1+n2N=n_{1}+n_{2} for the total sample size. When all four cell counts are available, all standard operating characteristics can be computed directly. When only partial information is reported, for example TP and FP without the corresponding denominators, the full table cannot be reconstructed from the published data alone. In that setting, sensitivity, specificity, and NPV are not identified without additional information or modeling assumptions.

Bayesian Approaches to the Binomial nn Problem

A central ingredient in our reconstruction problem is the classical binomial-nn problem: infer the number of trials nn in a Bin(n,p)(n,p) model when nn is unknown and pp may also be unknown. This problem is well known to be difficult, especially when only a single observation, or very limited data, are available. Classical estimators based on moments or maximum likelihood can be unstable, particularly when the sample variance is close to the sample mean, and the difficulty becomes more pronounced when pp is small or nn is large (Haldane, 1945; Olkin and Petkau, 1993; DasGupta and Rubin, 2005).

Bayesian methods provide a natural way to regularize this problem by introducing prior information on both nn and pp. Early work considered simple prior structures such as n∼Uniform​(1,N)n\sim\mathrm{Uniform}(1,N) and p∼Beta​(Ξ±,Ξ²)p\sim\mathrm{Beta}(\alpha,\beta), leading to posterior inference for nn through either posterior modes or posterior means under the chosen loss function (Draper and Guttman, 1978; Carroll and Lombard, 1985). Rubin’s empirical Bayes treatment of the problem was especially influential in motivating later hierarchical formulations (Rubin, 1978).

Subsequent developments introduced more flexible prior models. For example, Raftery (1988) considered a predictive framework with n∼Poisson​(ΞΌ)n\sim\mathrm{Poisson}(\mu) and p∼Uniform​(0,1)p\sim\mathrm{Uniform}(0,1). Other extensions include beta priors for pp together with truncated Poisson priors for nn (Bayoud, 2011), as well as continuous gamma approximations for nn that simplify computation (GΓΌnel and Chilko, 2000). These approaches reduce the instability of purely classical procedures and allow substantive prior information about plausible ranges of nn and pp to enter the analysis.

A practically attractive alternative is the empirical Bayes or integrated-likelihood approach, which estimates nn jointly with beta hyperparameters for the prior on pp by maximizing the beta-binomial likelihood (Carroll and Lombard, 1985; DasGupta and Rubin, 2005). Such methods are often computationally convenient and can perform well in small-sample settings where direct likelihood-based inference is erratic.

Because our incomplete-table problem involves missing denominators rather than merely missing cell probabilities, the binomial-nn literature provides a natural modeling foundation. In particular, the unknown stratum totals n1n_{1} and n2n_{2} can be viewed as latent trial counts that must be inferred from partially observed binomial information under suitable prior structure.

A recent and comprehensive review of estimation in the binomial-nn problem, including classical, Bayesian, empirical Bayes, and computational aspects, is given by Georgieva and Vidakovic (2025).

Incomplete Diagnostic Tables

Published diagnostic studies do not always report the full set of cell counts needed to reconstruct the 2Γ—22\times 2 table. When only a subset of the cells is available, quantities such as sensitivity, specificity, negative predictive value, and overall accuracy may no longer be identified from the published data alone. In some cases, positive predictive value can still be computed from the reported true positives and false positives, but the absence of information on false negatives and true negatives prevents recovery of the complete diagnostic table.

The two examples below illustrate that incomplete 2Γ—22\times 2 tables may arise in more than one way. In the first case, the missingness is linked to study design: only test-positive subjects underwent verification with the reference standard, so the test-negative row is unobserved. This setting is naturally connected to the literature on partial verification and work-up bias (de Groot et al., 2011; Buzoianu and Kadane, 2008; Cronin and Vickers, 2008; Kohn, 2022; Umemneku Chikere et al., 2019). In the second case, the total sample size is known and the published report provides counts for true positives and false positives, but the negative-class counts are omitted. This is better viewed as an incomplete reporting problem with a known total, which leads to a different reconstruction strategy. Together, these two cases motivate the models developed in the next section.

Case 1: Partial Verification

Svirsky et al. (2002) compared computer-assisted oral brush biopsy results with follow-up scalpel biopsy and histology in order to estimate the positive predictive value of an abnormal brush-biopsy result. Among 243 patients with abnormal brush-biopsy findings who then underwent scalpel biopsy, 93 were confirmed as dysplasia or carcinoma by histology and 150 were histology negative. Thus, PPV can be calculated directly within the test-positive subgroup as 93/243β‰ˆ0.3893/243\approx 0.38.

The difficulty is that only patients with abnormal brush-biopsy results underwent the reference standard. Patients with normal brush-biopsy results were not verified by histology, so the entire test-negative row is unobserved. As a consequence, the numbers of false negatives and true negatives are unknown, and the full 2Γ—22\times 2 table cannot be reconstructed from the published report. Sensitivity, specificity, negative predictive value, and overall accuracy therefore remain unidentified. In this sense, Case 1 is not merely an incomplete-table problem. It is also a partial-verification design, because the reference standard was applied only to a selected subgroup.

Histology Positive Histology Negative Total
Brush biopsy abnormal (test positive) 93 (TP) 150 (FP) 243
Brush biopsy normal (test negative) ? (FN) ? (TN) ?
Total n1=?n_{1}=? n2=?n_{2}=? N=?N=?
Table 2: Available and missing information from Svirsky et al. (2002). Only patients with abnormal brush-biopsy results underwent scalpel biopsy and histology, so the test-negative row and all column and overall totals are unobserved.

Although the system has at times been described in later discussions as AI-assisted, the technology in Svirsky et al. (2002) is more accurately viewed as an early rule-based computer-assisted diagnostic tool rather than artificial intelligence in the contemporary sense.

Case 2: Incomplete Reporting with Known NN

Wismueller et al. (2020) evaluated an AI-based system for detecting intracranial hemorrhage on emergent head CT scans. The paper reports that 105 of 122 AI-positive cases were true positives, so PPV can be calculated as 105/122β‰ˆ0.86105/122\approx 0.86. The study also reports the total number of scans, namely N=620N=620.

However, the total number of actual hemorrhage cases, n1=TP+FNn_{1}=\mathrm{TP}+\mathrm{FN}, is not reported, and neither are the counts of false negatives and true negatives. Consequently, sensitivity and specificity cannot be computed directly, and the complete 2Γ—22\times 2 table cannot be reconstructed from the published data alone.

ICH Present ICH Absent Total
AI positive (flagged as ICH) 105 (TP) 17 (FP) 122
AI negative (not flagged) ? (FN) ? (TN) ?
Total n1=?n_{1}=? n2=?n_{2}=? N=620N=620
Table 3: Partially observed 2Γ—22\times 2 table implied by Wismueller et al. (2020). The publication reports the AI-positive counts and the total sample size N=620N=620, but the negative-class counts and diseased/non-diseased totals remain unknown.

Case 2 differs from Case 1 in an important way. Here the main obstacle is not selective verification of the reference standard, but incomplete reporting despite a known total sample size. Because NN is available, the missing diseased and non-diseased totals are linked through the identity n1+n2=Nn_{1}+n_{2}=N. This makes Case 2 a constrained reconstruction problem rather than a partial-verification design. From a modeling point of view, that structural constraint provides information that is absent in Case 1.

These two examples therefore represent distinct forms of incomplete diagnostic reporting. Case 1 combines incomplete reporting with selective verification, whereas Case 2 is an incomplete-table problem with known total sample size. The distinction is important because it determines how much structural information is available for reconstruction and, consequently, what type of model is appropriate.

Models

We consider two settings for reconstructing incomplete 2Γ—22\times 2 diagnostic tables. The first arises when only one row of the table is observed, typically the test-positive row, and the corresponding denominators are unreported. The second arises when TP\mathrm{TP}, FP\mathrm{FP}, and the total sample size NN are known, so that the missing cells are linked through the constraint n1+n2=Nn_{1}+n_{2}=N.

Independent Binomial-nn Reconstruction for a Single Observed Row

Suppose that only one test-outcome row is reported, with observed counts in the diseased and non-diseased columns. Let yy denote the observed count in one column of that row, and let nn denote the corresponding unreported denominator. The interpretation of pp depends on the column under analysis. If yy is the number of true positives among diseased subjects, then pp is sensitivity and n=n1n=n_{1}. If yy is the number of false positives among non-diseased subjects, then pp is the false positive rate and n=n2n=n_{2}.

We model the observed count as

y∣n,p\displaystyle y\mid n,p ∼\displaystyle\sim Bin​(n,p),nβ‰₯y.\displaystyle\mathrm{Bin}(n,p),\qquad n\geq y. (4)

The success probability is assigned a beta prior

p\displaystyle p ∼\displaystyle\sim Beta​(Ξ±,Ξ²),\displaystyle\mathrm{Beta}(\alpha,\beta), (5)

with hyperparameters chosen to reflect plausible values of sensitivity or false positive rate, depending on the column.

To regularize the unknown denominator, we assign a truncated negative-binomial prior in the WinBUGS parameterization,

n∣p⋆,r\displaystyle n\mid p^{\star},r ∼\displaystyle\sim NegBin​(p⋆,r)​ 1​{nβ‰₯y},\displaystyle\mathrm{NegBin}(p^{\star},r)\,\mathbf{1}\{n\geq y\}, (6)

where NegBin​(p⋆,r)\mathrm{NegBin}(p^{\star},r) denotes the number of failures before rr successes with success probability p⋆p^{\star}. Before truncation,

𝔼​[n]=r​(1βˆ’p⋆)p⋆,Var​(n)=r​(1βˆ’p⋆)p⋆2.\mathbb{E}[n]=\frac{r(1-p^{\star})}{p^{\star}},\qquad\mathrm{Var}(n)=\frac{r(1-p^{\star})}{p^{\star 2}}.

We complete the hierarchy with

r∣λ\displaystyle r\mid\lambda ∼\displaystyle\sim Poisson​(Ξ»),\displaystyle\mathrm{Poisson}(\lambda), (7)
Ξ»\displaystyle\lambda ∼\displaystyle\sim Gamma​(a,b),\displaystyle\mathrm{Gamma}(a,b), (8)
p⋆\displaystyle p^{\star} ∼\displaystyle\sim Beta​(α⋆,β⋆).\displaystyle\mathrm{Beta}(\alpha^{\star},\beta^{\star}). (9)

Posterior inference for (n,p)(n,p) follows from (4)-(9). If the observed count is TP\mathrm{TP}, then the missing cell is

FN=nβˆ’TP.\mathrm{FN}=n-\mathrm{TP}.

If the observed count is FP\mathrm{FP}, then the missing cell is

TN=nβˆ’FP.\mathrm{TN}=n-\mathrm{FP}.

When both columns of the reported row are available, we fit this model separately to the diseased and non-diseased strata. This yields posterior inference for n1n_{1} and n2n_{2}, and hence for the missing cells.

Identifiability and prior sensitivity.

With only a single binomial observation, nn and pp are only weakly identified from the likelihood. The beta prior on pp and the negative-binomial hierarchy on nn provide the regularization needed for posterior inference. In practice, sensitivity analyses over (Ξ±,Ξ²)(\alpha,\beta) and (α⋆,β⋆,a,b)(\alpha^{\star},\beta^{\star},a,b) are important and can be summarized through posterior intervals for nn and the derived missing counts.

The WinBUGS/OpenBUGS implementation of this single-column model is provided in the Supplemental File and can be used twice in Case 1 type applications, once for the diseased column and once for the non-diseased column. For the diseased stratum, (Ξ±,Ξ²)(\alpha,\beta) may encode plausible sensitivity values; for the non-diseased stratum, it may encode plausible false positive rates.

Reconstruction of the Full 2Γ—22\times 2 Table Given TP\mathrm{TP}, FP\mathrm{FP}, and NN

We now consider the setting in which TP\mathrm{TP}, FP\mathrm{FP}, and the total sample size NN are reported. Let

n1=TP+FN,n2=FP+TN,n1+n2=N.n_{1}=\mathrm{TP}+\mathrm{FN},\qquad n_{2}=\mathrm{FP}+\mathrm{TN},\qquad n_{1}+n_{2}=N.

Once n1n_{1} is inferred, the remaining quantities follow from

n2=Nβˆ’n1,FN=n1βˆ’TP,TN=n2βˆ’FP.n_{2}=N-n_{1},\qquad\mathrm{FN}=n_{1}-\mathrm{TP},\qquad\mathrm{TN}=n_{2}-\mathrm{FP}.

The likelihood is

TP∣n1,p1\displaystyle\mathrm{TP}\mid n_{1},p_{1} ∼\displaystyle\sim Bin​(n1,p1),\displaystyle\mathrm{Bin}(n_{1},p_{1}), (10)
FP∣n2,p2\displaystyle\mathrm{FP}\mid n_{2},p_{2} ∼\displaystyle\sim Bin​(n2,p2),\displaystyle\mathrm{Bin}(n_{2},p_{2}), (11)

where p1p_{1} is sensitivity and 1βˆ’p21-p_{2} is specificity. The three models below share this same likelihood and differ only in the prior assigned to n1n_{1}.

Model 1: Discrete uniform prior.

A non-informative baseline model assigns equal prior mass to each feasible value of n1n_{1}:

n1\displaystyle n_{1} ∼\displaystyle\sim Uniform​{1,…,Nβˆ’1},\displaystyle\mathrm{Uniform}\{1,\dots,N-1\}, (12)
n2\displaystyle n_{2} =\displaystyle= Nβˆ’n1,\displaystyle N-n_{1}, (13)

with independent beta priors

p1\displaystyle p_{1} ∼\displaystyle\sim Beta​(a1,b1),\displaystyle\mathrm{Beta}(a_{1},b_{1}), (14)
p2\displaystyle p_{2} ∼\displaystyle\sim Beta​(a2,b2).\displaystyle\mathrm{Beta}(a_{2},b_{2}). (15)
Model 2: Truncated Poisson prior.

To favor moderate values of n1n_{1}, we replace the discrete uniform prior by a truncated Poisson prior:

n1∣λ\displaystyle n_{1}\mid\lambda ∼\displaystyle\sim Poisson​(Ξ»)​ 1​{1≀n1≀Nβˆ’1},\displaystyle\mathrm{Poisson}(\lambda)\,\mathbf{1}\{1\leq n_{1}\leq N-1\}, (16)
n2\displaystyle n_{2} =\displaystyle= Nβˆ’n1,\displaystyle N-n_{1}, (17)
Ξ»\displaystyle\lambda ∼\displaystyle\sim Gamma​(aΞ»,bΞ»),\displaystyle\mathrm{Gamma}(a_{\lambda},b_{\lambda}), (18)

again with independent beta priors on p1p_{1} and p2p_{2}.

Model 3: Truncated negative-binomial prior.

To allow additional dispersion in the diseased stratum size, we use a truncated negative-binomial prior:

n1∣p3,r\displaystyle n_{1}\mid p_{3},r ∼\displaystyle\sim NegBin​(p3,r)​ 1​{TP≀n1≀Nβˆ’1},\displaystyle\mathrm{NegBin}(p_{3},r)\,\mathbf{1}\{\mathrm{TP}\leq n_{1}\leq N-1\}, (19)
n2\displaystyle n_{2} =\displaystyle= Nβˆ’n1,\displaystyle N-n_{1}, (20)
p3\displaystyle p_{3} ∼\displaystyle\sim Beta​(a3,b3),\displaystyle\mathrm{Beta}(a_{3},b_{3}), (21)
r\displaystyle r ∼\displaystyle\sim Gamma​(ar,br),\displaystyle\mathrm{Gamma}(a_{r},b_{r}), (22)

together with independent beta priors on p1p_{1} and p2p_{2}.

Comparison of the three priors.

Model 1 provides a flat baseline over the feasible values of n1n_{1}. Model 2 introduces mild regularization through the Poisson mean Ξ»\lambda. Model 3 allows heavier tails and greater dispersion through (p3,r)(p_{3},r), and is therefore more flexible when disease prevalence is uncertain or substantial imbalance between strata is plausible.

In all three cases, posterior inference for n1n_{1} determines n2n_{2}, FN\mathrm{FN}, and TN\mathrm{TN}, thereby yielding a reconstructed 2Γ—22\times 2 table and allowing calculation of sensitivity, specificity, predictive values, and accuracy.

WinBUGS code for all three variants is provided in the Supplemental File.

Empirical Application

To evaluate the proposed models, we applied them to a complete contingency table from a breast MRI study (Langlotz, 2003). The dataset consists of 182 women with clinically or mammographically suspicious lesions, all of whom underwent biopsy, taken here as the reference standard. A true positive (TP) denotes an MRI-positive case with malignancy confirmed on biopsy, a false positive (FP) an MRI-positive case with benign biopsy, a false negative (FN) an MRI-negative case with malignancy on biopsy, and a true negative (TN) an MRI-negative case with benign biopsy.

TableΒ 4 gives the complete 2Γ—22\times 2 table. Because the full table is known, this example permits direct comparison between reconstructed and true counts.

Table 4: Patient data from the breast MRI study (Langlotz, 2003).
MRI Result Malignant Benign Total
Positive 71 28 99
Negative 3 80 83
Total 74 108 182

Single-Row Reconstruction

The next two subsections apply the single-row model separately to the diseased and non-diseased strata. We assume that only the first row of TableΒ 4 is available, namely TP=71\mathrm{TP}=71 malignant and FP=28\mathrm{FP}=28 benign cases among MRI-positive patients. The stratum totals n1n_{1} and n2n_{2} are then treated as unknown and estimated from the corresponding single-row models.

Diseased Stratum

For the diseased stratum, we observe TP=71\mathrm{TP}=71 and model

TP∣n1,p\displaystyle\mathrm{TP}\mid n_{1},p ∼\displaystyle\sim Bin​(n1,p),\displaystyle\mathrm{Bin}(n_{1},p), (23)

where pp is the sensitivity of MRI. The denominator n1n_{1} is assigned the truncated negative-binomial prior

n1∣p⋆,r\displaystyle n_{1}\mid p^{\star},r ∼\displaystyle\sim NegBin​(p⋆,r)​ 1​{n1β‰₯TP},\displaystyle\mathrm{NegBin}(p^{\star},r)\,\mathbf{1}\{n_{1}\geq\mathrm{TP}\}, (24)

with hierarchical priors

r∣λ\displaystyle r\mid\lambda ∼\displaystyle\sim Poisson​(Ξ»),λ∼Gamma​(a,b),\displaystyle\mathrm{Poisson}(\lambda),\qquad\lambda\sim\mathrm{Gamma}(a,b), (25)
p⋆\displaystyle p^{\star} ∼\displaystyle\sim Beta​(α⋆,β⋆),\displaystyle\mathrm{Beta}(\alpha^{\star},\beta^{\star}), (26)
p\displaystyle p ∼\displaystyle\sim Beta​(Ξ±,Ξ²).\displaystyle\mathrm{Beta}(\alpha,\beta). (27)

For this example we use

a=1,b=0.1,Ξ±=2,Ξ²=1,α⋆=1,β⋆=1.\displaystyle a=1,\;b=0.1,\qquad\alpha=2,\;\beta=1,\qquad\alpha^{\star}=1,\;\beta^{\star}=1. (28)

The prior Beta​(2,1)\mathrm{Beta}(2,1) on pp reflects the expectation of moderate to high sensitivity without being overly restrictive. The prior on Ξ»\lambda is diffuse, and the uniform Beta​(1,1)\mathrm{Beta}(1,1) prior on p⋆p^{\star} allows the negative-binomial prior on n1n_{1} to adapt to the data.

MCMC sampling is initialized at n1=100n_{1}=100, r=70r=70, Ξ»=70\lambda=70, and p=0.5p=0.5.

Table 5: Posterior summary: diseased stratum, single-row model (100,000 MCMC iterations).
Parameter Mean SD 2.5% Median 97.5%
Ξ»\lambda 16.25 11.59 1.49 13.74 45.32
n1n_{1} 76.57 8.73 71.0 74.0 99.0
pp 0.926 0.080 0.705 0.951 0.998
p⋆p^{\star} 0.178 0.104 0.019 0.165 0.410
rr 16.88 12.05 1.0 14.0 47.0

The posterior mean of n1n_{1} is 76.6, close to the true value of 74, and the 95% credible interval (71,99)(71,99) contains the truth. The posterior mean of pp is 0.93, consistent with high MRI sensitivity. The implied estimate FN=n1βˆ’TPβ‰ˆ6\mathrm{FN}=n_{1}-\mathrm{TP}\approx 6 is reasonably close to the true value of 3. Together with the observed FP=28\mathrm{FP}=28, this yields a plausible near-complete reconstruction of the diagnostic table.

Non-Diseased Stratum

For the benign stratum, we observe FP=28\mathrm{FP}=28 and estimate the unreported denominator n2n_{2} together with the false positive rate pp. The model is

FP∣n2,p\displaystyle\mathrm{FP}\mid n_{2},p ∼\displaystyle\sim Bin​(n2,p),\displaystyle\mathrm{Bin}(n_{2},p), (29)
n2∣p⋆,r\displaystyle n_{2}\mid p^{\star},r ∼\displaystyle\sim NegBin​(p⋆,r)​ 1​{n2β‰₯FP},\displaystyle\mathrm{NegBin}(p^{\star},r)\,\mathbf{1}\{n_{2}\geq\mathrm{FP}\}, (30)
r∣λ\displaystyle r\mid\lambda ∼\displaystyle\sim Poisson​(Ξ»),λ∼Gamma​(a,b),\displaystyle\mathrm{Poisson}(\lambda),\qquad\lambda\sim\mathrm{Gamma}(a,b), (31)
p⋆\displaystyle p^{\star} ∼\displaystyle\sim Beta​(α⋆,β⋆),p∼Beta​(Ξ±,Ξ²).\displaystyle\mathrm{Beta}(\alpha^{\star},\beta^{\star}),\qquad p\sim\mathrm{Beta}(\alpha,\beta). (32)

Here we set

a=2,b=1,Ξ±=2,Ξ²=5,α⋆=1,β⋆=50.\displaystyle a=2,\;b=1,\qquad\alpha=2,\;\beta=5,\qquad\alpha^{\star}=1,\;\beta^{\star}=50. (33)

The prior Beta​(2,5)\mathrm{Beta}(2,5) reflects the expectation that the false positive rate is below 0.50.5 while remaining flexible. The small prior mean of p⋆p^{\star} places substantial mass on larger values of n2n_{2}, consistent with the expectation that the non-diseased stratum may be larger than the diseased stratum.

Sampling is initialized at n2=100n_{2}=100, r=70r=70, Ξ»=70\lambda=70, and p=0.5p=0.5.

Table 6: Posterior summary: non-diseased stratum, single-row model (100,000 MCMC iterations).
Parameter Mean SD 2.5% Median 97.5%
Ξ»\lambda 1.712 1.205 0.200 1.439 4.723
n2n_{2} 106.0 75.3 40.0 86.0 295.0
pp 0.336 0.149 0.094 0.319 0.664
p⋆p^{\star} 0.016 0.015 0.0004 0.011 0.054
rr 1.428 1.556 0.0 1.0 5.0

The posterior mean of n2n_{2} is 106, close to the true total of 108, although the credible interval is wide. This reflects the limited information contained in a single observed cell together with the intentionally overdispersed prior. The posterior mean of pp is 0.34, consistent with a moderate false positive rate.

The resulting estimate implies TN=n2βˆ’FPβ‰ˆ78\mathrm{TN}=n_{2}-\mathrm{FP}\approx 78, close to the true value of 80. Combining this with the estimated diseased total produces the reconstructed table in TableΒ 7.

Table 7: Reconstructed 2Γ—22\times 2 table from the single-row model.
Malignant Benign
MRI positive 71 28
MRI negative 6 78

Even with only one observed row, the hierarchical Bayesian model recovers plausible denominators and yields a reasonable approximation to the full diagnostic structure, albeit with substantial uncertainty in the non-diseased stratum.

Single-Stratum Models with Known NN

We next examine the same strata when the total sample size is treated as known. This adds the design constraint that each stratum size must lie below NN, which changes the posterior behavior, especially for the non-diseased group.

Diseased Stratum with Known NN

For the diseased stratum we use

TP∣n1,p\displaystyle\mathrm{TP}\mid n_{1},p ∼\displaystyle\sim Binomial​(n1,p),\displaystyle\mathrm{Binomial}(n_{1},p), (34)
n1∣p⋆,r\displaystyle n_{1}\mid p^{\star},r ∼\displaystyle\sim Negative​-​Binomial​(p⋆,r)​ 1​{TP≀n1≀N},\displaystyle\mathrm{Negative\text{-}Binomial}(p^{\star},r)\,\mathbf{1}\{\mathrm{TP}\leq n_{1}\leq N\}, (35)
r∣λ\displaystyle r\mid\lambda ∼\displaystyle\sim Poisson​(Ξ»),λ∼Gamma​(a,b),\displaystyle\mathrm{Poisson}(\lambda),\qquad\lambda\sim\mathrm{Gamma}(a,b), (36)
p⋆\displaystyle p^{\star} ∼\displaystyle\sim Beta​(α⋆,β⋆),p∼Beta​(Ξ±,Ξ²).\displaystyle\mathrm{Beta}(\alpha^{\star},\beta^{\star}),\qquad p\sim\mathrm{Beta}(\alpha,\beta). (37)

The truncation TP≀n1≀N\mathrm{TP}\leq n_{1}\leq N enforces compatibility with both the observed true positives and the known study size.

We again set

a=1,b=0.1,Ξ±=2,Ξ²=1,α⋆=1,β⋆=1.\displaystyle a=1,\;b=0.1,\qquad\alpha=2,\;\beta=1,\qquad\alpha^{\star}=1,\;\beta^{\star}=1. (38)

These choices favor moderate to high sensitivity while keeping the prior on n1n_{1} fairly diffuse. MCMC sampling is initialized at n1=100n_{1}=100, r=70r=70, Ξ»=70\lambda=70, p=0.5p=0.5, and p⋆=0.5p^{\star}=0.5.

Table 8: Posterior summary: diseased stratum, known NN (900,000 MCMC iterations).
Parameter Mean SD 2.5% Median 97.5%
Ξ»\lambda 16.68 12.02 1.963 13.89 46.98
n1n_{1} 76.34 7.792 71.0 74.0 97.0
pp 0.927 0.076 0.716 0.952 0.998
p⋆p^{\star} 0.182 0.106 0.026 0.167 0.420
rr 17.35 12.50 2.082 14.45 48.84

The posterior mean of n1n_{1} is 76.3, again close to the true diseased count of 74 and contained within the 95% credible interval. This estimate is nearly identical to that obtained under the unconstrained single-row model, suggesting that inference on n1n_{1} is driven mainly by the observed true positives rather than by the upper-bound constraint. The posterior for pp remains concentrated near high sensitivity.

Non-Diseased Stratum with Known NN

For the non-diseased stratum, the observed count is FP=28\mathrm{FP}=28 and the inferential target is the number of benign cases n2≀Nn_{2}\leq N together with the false positive rate pp. The model is

FP∣n2,p\displaystyle\mathrm{FP}\mid n_{2},p ∼\displaystyle\sim Binomial​(n2,p),\displaystyle\mathrm{Binomial}(n_{2},p), (39)
n2∣p⋆,r\displaystyle n_{2}\mid p^{\star},r ∼\displaystyle\sim NegBin​(p⋆,r)​ 1​{FP≀n2≀N},\displaystyle\mathrm{NegBin}(p^{\star},r)\,\mathbf{1}\{\mathrm{FP}\leq n_{2}\leq N\}, (40)
r∣λ\displaystyle r\mid\lambda ∼\displaystyle\sim Poisson​(Ξ»),λ∼Gamma​(a,b),\displaystyle\mathrm{Poisson}(\lambda),\qquad\lambda\sim\mathrm{Gamma}(a,b), (41)
p⋆\displaystyle p^{\star} ∼\displaystyle\sim Beta​(α⋆,β⋆),p∼Beta​(Ξ±,Ξ²).\displaystyle\mathrm{Beta}(\alpha^{\star},\beta^{\star}),\qquad p\sim\mathrm{Beta}(\alpha,\beta). (42)

We use

a=2,b=1,Ξ±=2,Ξ²=5,α⋆=1,β⋆=50.\displaystyle a=2,\;b=1,\qquad\alpha=2,\;\beta=5,\qquad\alpha^{\star}=1,\;\beta^{\star}=50. (43)

The prior on pp again reflects the expectation of a modest false positive rate, while the prior on p⋆p^{\star} favors larger values of n2n_{2} without allowing arbitrarily large realizations once the upper bound NN is imposed.

Sampling is initialized at n2=100n_{2}=100, r=70r=70, Ξ»=70\lambda=70, p=0.5p=0.5, and p⋆=0.5p^{\star}=0.5.

Table 9: Posterior summary: non-diseased stratum, known NN (900,000 MCMC iterations).
Parameter Mean SD 2.5% Median 97.5%
Ξ»\lambda 2.092 1.249 0.387 1.854 5.147
n2n_{2} 80.36 32.39 38.0 73.0 162.0
pp 0.388 0.142 0.165 0.373 0.697
p⋆p^{\star} 0.025 0.018 0.002 0.021 0.072
rr 2.186 1.438 0.304 1.888 5.749

Imposing NN as an upper bound substantially concentrates the posterior for n2n_{2} relative to the unconstrained single-row model. In the unconstrained analysis, the heavy right tail pushed the posterior mean toward the true value of 108, albeit with considerable uncertainty. Under the bounded model, those extreme values are removed, producing a posterior mean of 80.4 and a median of 73. Thus, the known-NN constraint improves stability and interpretability, but in this example it reduces point-estimation accuracy for the non-diseased stratum.

Joint TP/FP\mathrm{TP}/\mathrm{FP} Model with Fixed NN

We finally consider a joint model in which TP=71\mathrm{TP}=71 and FP=28\mathrm{FP}=28 are analyzed simultaneously under the fixed total N=182N=182. The inferential targets are n1n_{1}, n2=Nβˆ’n1n_{2}=N-n_{1}, sensitivity p1p_{1}, and false positive rate p2p_{2}.

The model is

TP∣n1,p1\displaystyle\mathrm{TP}\mid n_{1},p_{1} ∼\displaystyle\sim Binomial​(n1,p1),\displaystyle\mathrm{Binomial}(n_{1},p_{1}), (44)
FP∣n2,p2\displaystyle\mathrm{FP}\mid n_{2},p_{2} ∼\displaystyle\sim Binomial​(n2,p2),\displaystyle\mathrm{Binomial}(n_{2},p_{2}), (45)
n1∣p3,r\displaystyle n_{1}\mid p_{3},r ∼\displaystyle\sim Negative​-​Binomial​(p3,r)​ 1​{TP≀n1≀Nβˆ’1},\displaystyle\mathrm{Negative\text{-}Binomial}(p_{3},r)\,\mathbf{1}\{\mathrm{TP}\leq n_{1}\leq N-1\}, (46)
n2\displaystyle n_{2} =\displaystyle= Nβˆ’n1,\displaystyle N-n_{1}, (47)
p1\displaystyle p_{1} ∼\displaystyle\sim Beta​(a1,b1),p2∼Beta​(a2,b2),p3∼Beta​(a3,b3),\displaystyle\mathrm{Beta}(a_{1},b_{1}),\qquad p_{2}\sim\mathrm{Beta}(a_{2},b_{2}),\qquad p_{3}\sim\mathrm{Beta}(a_{3},b_{3}), (48)
r\displaystyle r ∼\displaystyle\sim Gamma​(0.1,0.01).\displaystyle\mathrm{Gamma}(0.1,0.01). (49)

The fixed-NN constraint couples the two strata and enforces coherence across the reconstructed table.

We set

a1=1,b1=0.1,a2=0.1,b2=1,a3=0.1,b3=0.5,\displaystyle a_{1}=1,\;b_{1}=0.1,\qquad a_{2}=0.1,\;b_{2}=1,\qquad a_{3}=0.1,\;b_{3}=0.5, (50)

with r∼Gamma​(0.1,0.01)r\sim\mathrm{Gamma}(0.1,0.01). The prior on p1p_{1} places substantial mass near one, the prior on p2p_{2} favors smaller false positive rates, and the prior on p3p_{3} controls dispersion in the negative-binomial prior for n1n_{1}. MCMC is initialized at n1=80n_{1}=80, r=20r=20, and p1=p2=p3=0.5p_{1}=p_{2}=p_{3}=0.5.

Table 10: Posterior summary: joint TP/FP\mathrm{TP}/\mathrm{FP} model with fixed NN (900,000 MCMC iterations).
Parameter Mean SD 2.5% Median 97.5%
n1n_{1} 73.04 8.205 71.0 71.0 93.0
n2n_{2} 109.0 8.205 89.0 111.0 111.0
p1p_{1} 0.978 0.067 0.768 1.000 1.000
p2p_{2} 0.089 0.197 <10βˆ’3{<}10^{-3} <10βˆ’3{<}10^{-3} 0.782
p3p_{3} 0.144 0.187 0.014 0.059 0.673
rr 19.64 38.84 0.039 4.435 143.1

Under this joint specification, posterior inference for the stratum totals is both concentrated and internally consistent: the posterior means are n1=73.0n_{1}=73.0 and n2=109.0n_{2}=109.0, differing from the true values by at most one individual. The posterior for p1p_{1} concentrates near one, while the posterior for p2p_{2} is centered near zero but retains a right tail, reflecting residual uncertainty in the non-diseased stratum.

Compared with the preceding analyses, the joint model combines the information in TP\mathrm{TP} and FP\mathrm{FP} under the fixed-NN constraint and therefore avoids the instability of fitting the two strata separately. In this example, it provides the most balanced reconstruction, with realistic stratum sizes and appropriately quantified uncertainty.

Conclusions

We have developed hierarchical Bayesian models for reconstructing incomplete 2Γ—22\times 2 diagnostic tables in settings where only partial cell counts are reported. The proposed framework covers both the single-row setting, in which the denominators of the diseased and non-diseased strata are unobserved, and the constrained setting in which the total sample size NN is known. By combining binomial likelihoods with flexible priors on the latent denominators and diagnostic probabilities, the models provide a coherent way to infer missing cells and derived accuracy measures from incomplete published information.

The empirical application illustrates both the potential and the limitations of this approach. When only the test-positive row is observed, posterior inference can recover plausible values for the missing denominators and yield a reasonable reconstruction of the full diagnostic table, although uncertainty may remain substantial, especially for the non-diseased stratum. When the total sample size is known, the additional structural constraint can sharpen inference, and in the joint fixed-NN formulation the reconstructed stratum sizes are close to the true values in the benchmark example. At the same time, the results also show that reconstruction accuracy depends on the amount of information available and on the prior specification, particularly in weakly identified single-row settings.

The main contribution of the paper is therefore methodological and practical rather than purely deterministic. The proposed models do not claim to identify a unique missing table from sparse summaries alone. Rather, they provide a principled Bayesian framework for posterior inference on plausible completions of incompletely reported diagnostic tables, together with uncertainty quantification for the reconstructed cells and the resulting operating characteristics.

A further point concerns likelihood specification. One might consider a multinomial likelihood for the full 2Γ—22\times 2 table as an alternative starting point. Within the settings studied here, however, this does not appear to yield a substantive advantage over the binomial formulations already used. Once the unobserved cells are treated as latent and the structural constraints are imposed, the multinomial representation does not contribute additional identifiability beyond that already supplied by the observed counts, the prior structure, and, when available, the known total sample size.

More broadly, the work highlights the continuing importance of complete and transparent reporting in diagnostic accuracy studies. When denominators or entire rows of the diagnostic table are omitted, clinically relevant measures may become unavailable without additional modeling assumptions. Bayesian reconstruction cannot replace good reporting practice, but it can provide a useful inferential tool when incomplete reporting prevents direct recovery of the full table.

Data and code availability.

Reproducible code supporting the analyses in this manuscript, including the WinBUGS and R scripts used in the empirical analyses, is available at https://github.com/saraantonijevic/bayesian_diagnostic_table-reconstruction. A standalone appendix containing the full Bayesian analyses of the Svirsky (2002) and Wismueller (2020) motivating examples, including posterior summaries and reconstructed 2Γ—22\times 2 tables, is also provided at the same repository.

Acknowledgment. B. Vidakovic and S. Antonijavic acknowledge support from the National Science Foundation under Grant No.Β 2515246 at TexasΒ A&MΒ University.

References

  • F. Bayoud (2011) Bayesian and empirical Bayes estimation of binomial nn under truncated Poisson priors. Journal of Statistical Computation and Simulation 81, pp.Β 121–135. Cited by: Bayesian Approaches to the Binomial nn Problem.
  • P. M. Bossuyt, J. B. Reitsma, D. E. Bruns, C. A. Gatsonis, P. P. Glasziou, L. Irwig, and et al. (2015) STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 351, pp.Β h5527. External Links: Document, Link Cited by: Introduction.
  • M. Buzoianu and J. B. Kadane (2008) Adjusting for verification bias in diagnostic test evaluation: a Bayesian approach. Statistics in Medicine 27 (13), pp.Β 2453–2473. External Links: Document, Link Cited by: Introduction, Incomplete Diagnostic Tables.
  • R. J. Carroll and F. Lombard (1985) A Bayesian approach to the binomial nn problem using the integrated likelihood. Biometrika 72, pp.Β 583–590. Cited by: Bayesian Approaches to the Binomial nn Problem, Bayesian Approaches to the Binomial nn Problem.
  • J. F. Cohen, D. A. Korevaar, D. G. Altman, D. E. Bruns, C. A. Gatsonis, L. Hooft, and et al. (2016) STARD 2015 guidelines for reporting diagnostic accuracy studies: Explanation and elaboration. BMJ Open 6 (11), pp.Β e012799. External Links: Document, Link Cited by: Introduction.
  • A. M. Cronin and A. J. Vickers (2008) Statistical methods to correct for verification bias in diagnostic studies. Statistics in Medicine 27 (23), pp.Β 4670–4685. External Links: Document Cited by: Incomplete Diagnostic Tables.
  • A. DasGupta and D. B. Rubin (2005) Improved moment and maximum likelihood estimators for the binomial nn problem. Statistica Sinica 15, pp.Β 709–722. Cited by: Bayesian Approaches to the Binomial nn Problem, Bayesian Approaches to the Binomial nn Problem.
  • J. A. H. de Groot, P. M. M. Bossuyt, J. B. Reitsma, A. W. S. Rutjes, N. Dendukuri, K. J. M. Janssen, and K. G. M. Moons (2011) Verification problems in diagnostic accuracy studies: consequences and solutions. BMJ 343, pp.Β d4770. External Links: Document, Link Cited by: Introduction, Incomplete Diagnostic Tables.
  • N. R. Draper and I. Guttman (1978) Bayesian estimation of binomial nn with beta prior for pp. Technometrics 20, pp.Β 217–222. Cited by: Bayesian Approaches to the Binomial nn Problem.
  • EQUATOR Network (2015) STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Note: Checklist and resources available at https://www.equator-network.org/reporting-guidelines/stard/Accessed 2025-10-21 Cited by: Introduction.
  • P. Eusebi (2013) Diagnostic accuracy measures. Cerebrovascular Diseases 36 (4), pp.Β 267–272. External Links: Document Cited by: Review and Notation for 2Γ—22\times 2 Diagnostic Accuracy Tables and the Binomial nn Problem.
  • M. Georgieva and B. Vidakovic (2025) Revisiting estimation of number of trials in the Binomial (n,p)(n,p) problem with a single observation. International Statistical Review 93 (2), pp.Β 246–266. External Links: Document, Link Cited by: Bayesian Approaches to the Binomial nn Problem.
  • F. GΓΌnel and R. Chilko (2000) Continuous approximations for Bayesian estimation of binomial nn. Computational Statistics 15 (3), pp.Β 345–361. Cited by: Bayesian Approaches to the Binomial nn Problem.
  • J. B. S. Haldane (1945) On a method of estimating nn and pp in the binomial distribution. Biometrika 33, pp.Β 264–274. Cited by: Bayesian Approaches to the Binomial nn Problem.
  • M. A. Kohn (2022) Partial verification bias and test result-based sampling. Journal of Clinical Epidemiology 145, pp.Β 179–182. External Links: Document Cited by: Incomplete Diagnostic Tables.
  • C. P. Langlotz (2003) Fundamental measures of diagnostic examination performance: usefulness for clinical decision making and research. Radiology 228, pp.Β 3–9. External Links: Document Cited by: Review and Notation for 2Γ—22\times 2 Diagnostic Accuracy Tables and the Binomial nn Problem, Table 4, Empirical Application.
  • P. Macaskill, C. Gatsonis, J. J. Deeks, R. M. Harbord, and Y. Takwoingi (2010) Cochrane handbook for systematic reviews of diagnostic test accuracy, version 1.0, chapter 10. The Cochrane Collaboration. External Links: Link Cited by: Introduction, Introduction.
  • I. Olkin and A. J. Petkau (1993) Stabilized estimators for the binomial nn problem. Journal of Statistical Planning and Inference 37, pp.Β 89–105. Cited by: Bayesian Approaches to the Binomial nn Problem.
  • A. E. Raftery (1988) A Bayesian approach to the binomial nn problem. Journal of the American Statistical Association 83, pp.Β 703–709. Cited by: Bayesian Approaches to the Binomial nn Problem.
  • H. Rubin (1978) Empirical Bayes estimation of the Binomial nn problem. Journal of the American Statistical Association 73 (363), pp.Β 173–178. External Links: Document Cited by: Bayesian Approaches to the Binomial nn Problem.
  • N. Smidt, A. W. S. Rutjes, D. A. W. M. van der Windt, R. W. J. G. Ostelo, J. B. Reitsma, P. M. Bossuyt, and et al. (2005) Quality of reporting of diagnostic accuracy studies. Radiology 235 (2), pp.Β 347–353. External Links: Document, Link Cited by: Introduction.
  • V. Sounderajah, A. Guni, X. Liu, G. S. Collins, A. Karthikesalingam, S. R. Markar, R. M. Golub, A. K. Denniston, S. Shetty, D. Moher, P. M. Bossuyt, A. Darzi, H. Ashrafian, and S. S. Committee (2025) The stard-ai reporting guideline for diagnostic accuracy studies using artificial intelligence. Nature Medicine 31 (10), pp.Β 3283–3289. External Links: Document Cited by: Introduction.
  • J. A. Svirsky, H. L. Burns, S. S. Carpenter, and et al. (2002) Comparison of computer-assisted brush biopsy results with follow-up scalpel biopsy and histology. General Dentistry 50 (5), pp.Β 500–503. Cited by: Introduction, Case 1: Partial Verification, Case 1: Partial Verification, Table 2.
  • U.S. Food and Drug Administration (2007) Statistical guidance on reporting results from studies evaluating diagnostic tests. Note: Guidance for Industry and FDA Staff, March 13, 2007 External Links: Link Cited by: Introduction.
  • C. M. Umemneku Chikere, K. Wilson, S. Graziadio, L. Vale, and A. J. Allen (2019) Diagnostic test evaluation methodology: a systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard – an update. PLOS ONE 14 (10), pp.Β e0223832. External Links: Document, Link Cited by: Introduction, Incomplete Diagnostic Tables.
  • B. Vidakovic (2017) Engineering Biostatistics: An Introduction using MATLAB and WinBUGS. 1st edition, Wiley, Hoboken, NJ. External Links: ISBN 978-1119168966 Cited by: Review and Notation for 2Γ—22\times 2 Diagnostic Accuracy Tables and the Binomial nn Problem.
  • S. J. White, M. Chau, E. Arruzza, M. Ong, H. John, R. Theiss, K. L. Yaxley, and M. To (2025) Assessment of standards for reporting of diagnostic accuracy (stard) 2015 guideline adherence in medical imaging diagnostic accuracy studies published in 2023. Journal of Clinical Epidemiology 179, pp.Β 111654. External Links: Document Cited by: Introduction.
  • N. L. Wilczynski, R. B. Haynes, and H. Team (2008) Quality of reporting of diagnostic accuracy studies: no change since stard statement publication, before and after study. Radiology 248 (3), pp.Β 817–823. External Links: Document, Link Cited by: Introduction.
  • A. Wismueller, A. M. McKinney, M. A. Riedl, E. J. Rummeny, and R. Wismueller (2020) A prospective randomized clinical trial for measuring radiology study reporting time on artificial intelligence-based detection of intracranial hemorrhage in emergent care head ct. Note: arXiv preprint arXiv:2002.12515Accessed 2025-10-20 External Links: Link Cited by: Introduction, Case 2: Incomplete Reporting with Known NN, Table 3.