What Are the Implications of Alternative Alpha Thresholds for Hypothesis Testing in Orthopaedics?

David C Landy, Thomas J Utset-Ward, Michael J Lee
Author Information
  1. David C Landy: D. C. Landy, T. J. Utset-Ward, M. J. Lee, University of Chicago, Chicago, IL, USA.

Abstract

BACKGROUND: Clinical research in orthopaedics typically reports the presence of an association after rejecting a null hypothesis of no association using an alpha threshold of 0.05 at which to evaluate a calculated p value. This arbitrary value is a factor that results in the current difficulties reproducing research findings. A proposal is gaining attention to lower the alpha threshold to 0.005. However, it is currently unknown how alpha thresholds are used in orthopaedics and the distribution of p values reported.
QUESTIONS/PURPOSES: We sought to describe the use of alpha thresholds in two orthopaedic journals by asking (1) How frequently are alpha threshold values reported? (2) How frequently are power calculations reported? (3) How frequently are p values between 0.005 and 0.05 reported for the main hypothesis? (4) Are p values less than 0.005 associated with study characteristics such as design and reporting power calculations?
METHODS: The 100 most recent original clinical research articles from two leading orthopaedic journals at the time of this proposal were reviewed. For studies without a specified primary hypothesis, a main hypothesis was selected that was most consistent with the title and abstract. The p value for the main hypothesis and lowest p value for each study were recorded. Study characteristics including details of alpha thresholds, beta, and p values were recorded. Associations between study characteristics and p values were described. Of the 200 articles (100 from each journal), 23 were randomized controlled trials, 141 were cohort studies or case series (defined as a study in which authors had access to original data collected for the study purpose), 31 were database studies, and five were classified as other.
RESULTS: An alpha threshold was reported in 166 articles (83%) with all but two reporting a value 0.05. Forty-two articles (21%) reported performing a power calculation. The p value for the main hypothesis was less than 0.005 for 88 articles (44%), between 0.05 and 0.005 for 67 (34%), and greater than 0.05 for 29 (15%). The smallest p value was between 0.05 and 0.005 for 39 articles (20%), less than 0.005 for 143 (72%), and either not provided or greater than 0.05 for 18 (9%). Although 50% (65 of 130) cohort and database papers had a main hypothesis p value less than 0.005, only 26% (6 of 23) randomized controlled trials did. Only 36% (15 of 42) articles reporting a power calculation had a p value less than 0.005 compared with 51% (73 of 142) that did not report one.
CONCLUSIONS: Although a lower alpha threshold may theoretically increase the reproducibility of research findings across orthopaedics, this would preferentially select findings from lower-quality studies or increase the burden on higher quality ones. A more-nuanced approach could be to consider alpha thresholds specific to study characteristics. For example, randomized controlled trials with a prespecified primary hypothesis may still be best evaluated at 0.05 while database studies with an abundance of statistical tests may be best evaluated at a threshold even below 0.005.
CLINICAL RELEVANCE: Surgeons and scientists in orthopaedics should understand that the default alpha threshold of 0.05 represents an arbitrary value that could be lowered to help reduce type-I errors; however, it must also be appreciated that such a change could increase type-II errors, increase resource utilization, and preferentially select findings from lower-quality studies.

References

  1. Aviat Space Environ Med. 2005 Jul;76(7):675-80 [PMID: 16018352]
  2. JAMA. 2009 Sep 2;302(9):977-84 [PMID: 19724045]
  3. Clin Orthop Relat Res. 2011 Sep;469(9):2645-53 [PMID: 21246313]
  4. Clin Orthop Relat Res. 2009 Oct;467(10):2736-7 [PMID: 19565303]
  5. Clin Orthop Relat Res. 2018 Sep;476(9):1689-1691 [PMID: 30024469]
  6. PLoS One. 2012;7(9):e44275 [PMID: 22984483]
  7. BMJ. 2010 Feb 15;340:c365 [PMID: 20156912]
  8. J Bone Joint Surg Am. 1999 Oct;81(10):1454-60 [PMID: 10535596]
  9. J Bone Joint Surg Am. 2007 Jan;89(1):1-10 [PMID: 17200303]
  10. Nat Hum Behav. 2018 Jan;2(1):6-10 [PMID: 30980045]
  11. JAMA. 2018 Sep 4;320(9):935 [PMID: 30193271]
  12. Clin Orthop Relat Res. 2017 Jan;475(1):1-3 [PMID: 27896675]
  13. N Engl J Med. 2017 Jan 26;376(4):383-391 [PMID: 28121511]
  14. Trials. 2010 Apr 13;11:37 [PMID: 20388211]
  15. Am J Sports Med. 2012 Sep;40(9):1967-9 [PMID: 22941574]
  16. PLoS Med. 2005 Aug;2(8):e124 [PMID: 16060722]
  17. JAMA. 2018 Nov 6;320(17):1813-1815 [PMID: 30398593]
  18. JAMA. 2018 Apr 10;319(14):1429-1430 [PMID: 29566133]
  19. J Bone Joint Surg Am. 2017 Sep 20;99(18):1598-1603 [PMID: 28926390]
  20. BMJ. 1998 Oct 24;317(7166):1151-60 [PMID: 9784463]
  21. Control Clin Trials. 2004 Dec;25(6):613-9 [PMID: 15588747]
  22. Epidemiology. 1990 Jan;1(1):43-6 [PMID: 2081237]

MeSH Term

Biomedical Research
Mathematical Concepts
Orthopedic Procedures
Research Design

Word Cloud

Created with Highcharts 10.0.00palphavalue00505hypothesisthresholdarticlesvaluesstudystudiesmainlessresearchorthopaedicsfindingsthresholdsreportedpowercharacteristicsincreasetwofrequentlyreportingrandomizedcontrolledtrialsdatabasemayassociationarbitraryproposallowerorthopaedicjournalsreported?100originalprimaryrecorded23cohortcalculationgreaterAlthoughpreferentiallyselectlower-qualitybestevaluatederrorsBACKGROUND:ClinicaltypicallyreportspresencerejectingnullusingevaluatecalculatedfactorresultscurrentdifficultiesreproducinggainingattentionHowevercurrentlyunknownuseddistributionQUESTIONS/PURPOSES:soughtdescribeuseasking12calculations3hypothesis?4associateddesigncalculations?METHODS:recentclinicalleadingtimereviewedwithoutspecifiedselectedconsistenttitleabstractlowestStudyincludingdetailsbetaAssociationsdescribed200journal141caseseriesdefinedauthorsaccessdatacollectedpurpose31fiveclassifiedotherRESULTS:16683%Forty-two21%performing8844%6734%2915%smallest3920%14372%eitherprovided189%50%65130papers26%636%1542compared51%73142reportoneCONCLUSIONS:theoreticallyreproducibilityacrossburdenhigherqualityonesmore-nuancedapproachconsiderspecificexampleprespecifiedstillabundancestatisticaltestsevenCLINICALRELEVANCE:Surgeonsscientistsunderstanddefaultrepresentsloweredhelpreducetype-Ihowevermustalsoappreciatedchangetype-IIresourceutilizationImplicationsAlternativeAlphaThresholdsHypothesisTestingOrthopaedics?

Similar Articles

Cited By