The prevalence of statistical reporting errors in psychology (1985-2013).

Michèle B Nuijten, Chris H J Hartgerink, Marcel A L M van Assen, Sacha Epskamp, Jelte M Wicherts
Author Information
  1. Michèle B Nuijten: Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, Netherlands. m.b.nuijten@uvt.nl.
  2. Chris H J Hartgerink: Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, Netherlands.
  3. Marcel A L M van Assen: Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, Netherlands.
  4. Sacha Epskamp: Psychological Methods, University of Amsterdam, Amsterdam, Netherlands.
  5. Jelte M Wicherts: Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, Netherlands.

Abstract

This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package "statcheck." statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called "co-pilot model," and to use statcheck to flag possible inconsistencies in one's own manuscript or during the review process.

Keywords

References

  1. JAMA. 2007 Jul 25;298(4):430-7 [PMID: 17652297]
  2. Perspect Psychol Sci. 2014 Nov;9(6):652-60 [PMID: 26186115]
  3. Nature. 2014 May 1;509(7498):33 [PMID: 24784207]
  4. PLoS One. 2011;6(11):e26828 [PMID: 22073203]
  5. J Pers Soc Psychol. 2015 Feb;108(2):275-97 [PMID: 25603376]
  6. Behav Res Methods. 2011 Sep;43(3):666-78 [PMID: 21494917]
  7. Perspect Psychol Sci. 2012 Nov;7(6):562-71 [PMID: 26168113]
  8. Psychon Bull Rev. 2007 Oct;14(5):779-804 [PMID: 18087943]
  9. PLoS One. 2011;6(9):e24357 [PMID: 21915316]
  10. BMC Med Res Methodol. 2004 May 28;4:13 [PMID: 15169550]
  11. Psychol Methods. 2015 Sep;20(3):293-309 [PMID: 25401773]
  12. Nature. 2011 Nov 30;480(7375):7 [PMID: 22129686]
  13. Int J Methods Psychiatr Res. 2007;16(4):202-7 [PMID: 18188836]
  14. Psychol Sci. 2012 May 1;23(5):524-32 [PMID: 22508865]
  15. PLoS One. 2014 Jul 29;9(7):e103360 [PMID: 25072606]
  16. Perspect Psychol Sci. 2012 Nov;7(6):615-31 [PMID: 26168121]
  17. PLoS Med. 2013 Dec;10(12):e1001563 [PMID: 24311988]
  18. Psychol Bull. 1960 Sep;57:416-28 [PMID: 13744252]
  19. J Exp Psychol Gen. 2014 Apr;143(2):534-47 [PMID: 23855496]
  20. Q J Exp Psychol (Hove). 2013;66(12):2303-9 [PMID: 24205936]
  21. Psicothema. 2013;25(3):408-14 [PMID: 23910759]
  22. Am Psychol. 2001 Jan;56(1):16-26 [PMID: 11242984]
  23. PLoS One. 2014 Dec 10;9(12 ):e114876 [PMID: 25493918]
  24. Psychol Sci. 2007 Mar;18(3):230-2 [PMID: 17444919]

MeSH Term

Behavioral Research
Bias
Humans
Prevalence

Word Cloud

Created with Highcharts 10.0.0p-valuesprevalencereportingerrorsreportedpsychologyNHSTresultsinconsistentinconsistencieseight"statcheckhalfearlierresearchfoundpapersusecontainedp-valuestatisticalsignificantbiasstudydocumentssample250000majorjournals19852013usingnewRpackage"statcheckretrievednull-hypothesissignificancetestingarticlesperiodlinepublishedleastoneteststatisticdegreesfreedomOnegrosslymayaffectedconclusioncontrastfindingsaveragestableyearsdeclinedgrosshighernonsignificantindicatesystematicfavorPossiblesolutionshighencouragesharingdataletco-authorscheckso-called"co-pilotmodelflagpossibleone'smanuscriptreviewprocess1985-2013FalsepositivesPublicationQuestionablepracticesReportingSignificance

Similar Articles

Cited By