Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials.

Adrian Barnett
Author Information
  1. Adrian Barnett: Australian Centre for Health Services Innovation & Centre for Healthcare Transformation, Queensland University of Technology, Kelvin Grove, Queensland, 4059, Australia. ORCID

Abstract

: Papers describing the results of a randomised trial should include a baseline table that compares the characteristics of randomised groups. Researchers who fraudulently generate trials often unwittingly create baseline tables that are implausibly similar (under-dispersed) or have large differences between groups (over-dispersed). I aimed to create an automated algorithm to screen for under- and over-dispersion in the baseline tables of randomised trials. : Using a cross-sectional study I examined 2,245 randomised controlled trials published in health and medical journals on . I estimated the probability that a trial's baseline summary statistics were under- or over-dispersed using a Bayesian model that examined the distribution of t-statistics for the between-group differences, and compared this with an expected distribution without dispersion. I used a simulation study to test the ability of the model to find under- or over-dispersion and compared its performance with an existing test of dispersion based on a uniform test of p-values. My model combined categorical and continuous summary statistics, whereas the uniform test used only continuous statistics. : The algorithm had a relatively good accuracy for extracting the data from baseline tables, matching well on the size of the tables and sample size. Using t-statistics in the Bayesian model out-performed the uniform test of p-values, which had many false positives for skewed, categorical and rounded data that were not under- or over-dispersed. For trials published on , some tables appeared under- or over-dispersed because they had an atypical presentation or had reporting errors. Some trials flagged as under-dispersed had groups with strikingly similar summary statistics. : Automated screening for fraud of all submitted trials is challenging due to the widely varying presentation of baseline tables. The Bayesian model could be useful in targeted checks of suspected trials or authors.

Keywords

References

  1. F1000Res. 2023 May 30;11:783 [PMID: 37360941]
  2. BMC Res Notes. 2022 Jun 11;15(1):203 [PMID: 35690782]
  3. J Am Med Inform Assoc. 2020 Dec 9;27(12):1903-1912 [PMID: 32940710]
  4. Nature. 2018 Nov;563(7733):609-610 [PMID: 30482927]
  5. J Clin Epidemiol. 2021 Aug;136:189-202 [PMID: 34033915]
  6. Trials. 2017 May 2;18(1):204 [PMID: 28464922]
  7. PLoS One. 2017 Nov 6;12(11):e0185886 [PMID: 29107973]
  8. J Clin Epidemiol. 2019 Jun;110:50-62 [PMID: 30858019]
  9. J Clin Epidemiol. 2019 Aug;112:67-76 [PMID: 31125614]
  10. Stat Med. 2019 May 20;38(11):2074-2102 [PMID: 30652356]
  11. Anaesthesia. 2017 Jan;72(1):17-27 [PMID: 27988952]
  12. Nature. 2019 Jul;571(7766):462-464 [PMID: 31337919]
  13. Pharm Stat. 2011 May-Jun;10(3):257-64 [PMID: 20936626]
  14. BMJ. 2007 Feb 24;334(7590):392-4 [PMID: 17322250]
  15. Res Synth Methods. 2020 Sep;11(5):574-579 [PMID: 32275351]
  16. Anesth Analg. 2017 Oct;125(4):1381-1385 [PMID: 28786843]
  17. Neurology. 2016 Dec 6;87(23):2391-2402 [PMID: 27920281]
  18. Stat Med. 2002 Oct 15;21(19):2917-30 [PMID: 12325108]
  19. Anaesthesia. 2015 Jul;70(7):848-58 [PMID: 26032950]
  20. Psychol Sci. 2013 Oct;24(10):1875-88 [PMID: 23982243]
  21. Anaesthesia. 2021 Apr;76(4):472-479 [PMID: 33040331]
  22. Lancet. 2014 Jan 18;383(9913):267-76 [PMID: 24411647]
  23. J Clin Epidemiol. 2016 Feb;70:272-4 [PMID: 26163124]
  24. Anaesthesia. 2012 May;67(5):521-537 [PMID: 22404311]
  25. PLoS One. 2021 Nov 30;16(11):e0260395 [PMID: 34847169]
  26. BMJ. 2010 Mar 23;340:c332 [PMID: 20332509]
  27. Fertil Steril. 2020 Jun;113(6):1113-1119 [PMID: 32387277]
  28. BMJ. 2015 Jun 03;350:h2463 [PMID: 26041754]
  29. Nature. 2022 Jan;601(7892):167 [PMID: 35017708]
  30. EMBO J. 2018 Jun 15;37(12): [PMID: 29794111]
  31. PLoS One. 2018 Aug 2;13(8):e0201856 [PMID: 30071110]
  32. Stat Med. 1999 Dec 30;18(24):3435-51 [PMID: 10611617]
  33. Contemp Clin Trials. 2015 Nov;45(Pt A):21-5 [PMID: 26244705]
  34. Nat Commun. 2021 Oct 5;12(1):5840 [PMID: 34611157]
  35. J Neurosci Res. 2019 Apr;97(4):377-390 [PMID: 30506706]
  36. Anaesthesia. 2017 Aug;72(8):944-952 [PMID: 28580651]
  37. Nat Med. 2021 Jan;27(1):6-7 [PMID: 33432174]
  38. PLoS One. 2013 Oct 01;8(10):e76010 [PMID: 24098419]
  39. Clin Investig (Lond). 2015;5(2):161-173 [PMID: 25729561]
  40. BMC Res Notes. 2011 Aug 19;4:304 [PMID: 21854631]
  41. Am J Respir Crit Care Med. 2000 Oct;162(4 Pt 1):1193-4 [PMID: 11029316]
  42. PLoS One. 2022 Feb 16;17(2):e0263023 [PMID: 35171921]

MeSH Term

Humans
Cross-Sectional Studies
Bayes Theorem
Sample Size
Randomized Controlled Trials as Topic

Word Cloud

Created with Highcharts 10.0.0trialsbaselinetablesrandomisedunder-modeltest:over-dispersedstatisticsBayesiangroupssummaryuniformcreatesimilarunder-disperseddifferencesalgorithmover-dispersionUsingstudyexaminedcontrolledpublisheddistributiont-statisticscompareddispersionusedp-valuescategoricalcontinuousdatasizepresentationreportingerrorsAutomatedfraudPapersdescribingresultstrialincludetablecomparescharacteristicsResearchersfraudulentlygenerateoftenunwittinglyimplausiblylargeaimedautomatedscreencross-sectional2245healthmedicaljournalsestimatedprobabilitytrial'susingbetween-groupexpectedwithoutsimulationabilityfindperformanceexistingbasedcombinedwhereasrelativelygoodaccuracyextractingmatchingwellsampleout-performedmanyfalsepositivesskewedroundedappearedatypicalflaggedstrikinglyscreeningsubmittedchallengingduewidelyvaryingusefultargetedcheckssuspectedauthorsdetectionover-under-dispersionanalysisautomationrandomisation

Similar Articles

Cited By (5)