Machine learning misclassification networks reveal a citation advantage of interdisciplinary publications only in high-impact journals.

Alexey Lyutov, Yilmaz Uygun, Marc-Thorsten Hütt
Author Information
  1. Alexey Lyutov: School of Business, Social and Decision Science, Constructor University, 28759, Bremen, Germany.
  2. Yilmaz Uygun: School of Business, Social and Decision Science, Constructor University, 28759, Bremen, Germany.
  3. Marc-Thorsten Hütt: School of Science, Constructor University, 28759, Bremen, Germany. mhuett@constructor.university.

Abstract

Given a large enough volume of data and precise, meaningful categories, training a statistical model to solve a classification problem is straightforward and has become a standard application of machine learning (ML). If the categories are not precise, but rather fuzzy, as in the case of scientific disciplines, the systematic failures of ML classification can be informative about properties of the underlying categories. Here we classify a large volume of academic publications using only the abstract as information. From the publications that are classified differently by journal categories and ML categories (i.e., misclassified publications, when using the journal assignment as ground truth) we construct a network among disciplines. Analysis of these misclassifications provides insight in two topics at the core of the science of science: (1) Mapping out the interplay of disciplines. We show that this misclassification network is informative about the interplay of academic disciplines and it is similar to, but distinct from, a citation-based map of science, where nodes are scientific disciplines and an edge indicates a strong co-citation count between publications in these disciplines. (2) Analyzing the success of interdisciplinarity. By evaluating the citation patterns of publications, we show that misclassification can be linked to interdisciplinarity and, furthermore, that misclassified articles have different citation frequencies than correctly classified articles: In the highest 10 percent of journals in each discipline, these misclassified articles are on average cited more frequently, while in the rest of the journals they are cited less frequently.

Keywords

References

  1. Sci Adv. 2015 Sep 18;1(8):e1500211 [PMID: 26601251]
  2. Science. 2018 Mar 2;359(6379): [PMID: 29496846]
  3. Science. 2018 Nov 16;362(6416):825-829 [PMID: 30409804]
  4. Proc Natl Acad Sci U S A. 2021 Feb 16;118(7): [PMID: 33558230]
  5. Proc Natl Acad Sci U S A. 2022 Nov 22;119(47):e2118046119 [PMID: 36395142]
  6. Brief Bioinform. 2021 Mar 22;22(2):1592-1603 [PMID: 33569575]
  7. PLoS One. 2009;4(3):e4803 [PMID: 19277205]
  8. Nature. 2016 Jun 29;534(7609):684-7 [PMID: 27357795]
  9. Proc Natl Acad Sci U S A. 2022 Apr 26;119(17):e2117488119 [PMID: 35446703]
  10. Science. 1965 Jul 30;149(3683):510-5 [PMID: 14325149]
  11. Science. 2008 Nov 21;322(5905):1259-62 [PMID: 18845711]
  12. PLoS One. 2015 Aug 12;10(8):e0135095 [PMID: 26266805]
  13. Proc Natl Acad Sci U S A. 2019 Oct 29;116(44):22094-22099 [PMID: 31611374]
  14. Sci Adv. 2021 Jan 6;7(2): [PMID: 33523967]
  15. Sci Rep. 2012;2:551 [PMID: 22870380]
  16. PLoS One. 2008 Feb 27;3(2):e1683 [PMID: 18301760]
  17. PLoS One. 2015 May 22;10(5):e0127298 [PMID: 26001108]
  18. Proc Natl Acad Sci U S A. 2015 Mar 24;112(12):3653-8 [PMID: 25733900]
  19. Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1:5200-5 [PMID: 14745042]
  20. Science. 2007 May 18;316(5827):1036-9 [PMID: 17431139]
  21. Front Res Metr Anal. 2019 Apr 30;4:2 [PMID: 33870034]
  22. Proc Natl Acad Sci U S A. 2022 Oct 4;119(40):e2206070119 [PMID: 36161888]
  23. Proc Natl Acad Sci U S A. 2022 Aug 16;119(33):e2207436119 [PMID: 35939670]
  24. Sci Adv. 2021 Apr 23;7(17): [PMID: 33893092]
  25. Proc Natl Acad Sci U S A. 2022 Sep 6;119(36):e2200841119 [PMID: 36037387]
  26. Scientometrics. 2011 Aug;88(2):675-677 [PMID: 21836764]
  27. Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):13823-6 [PMID: 26504239]

Word Cloud

Created with Highcharts 10.0.0disciplinespublicationscategorieslearningMLmisclassifiedsciencemisclassificationcitationjournalslargevolumepreciseclassificationscientificcaninformativeacademicusingclassifiedjournalnetworkinterplayshowinterdisciplinarityarticlescitedfrequentlyMachineGivenenoughdatameaningfultrainingstatisticalmodelsolveproblemstraightforwardbecomestandardapplicationmachineratherfuzzycasesystematicfailurespropertiesunderlyingclassifyabstractinformationdifferentlyieassignmentgroundtruthconstructamongAnalysismisclassificationsprovidesinsighttwotopicscorescience:1Mappingsimilardistinctcitation-basedmapnodesedgeindicatesstrongco-citationcount2Analyzingsuccessevaluatingpatternslinkedfurthermoredifferentfrequenciescorrectlyarticles:highest10percentdisciplineaveragerestlessnetworksrevealadvantageinterdisciplinaryhigh-impactInterdisciplinaryresearchMaps

Similar Articles

Cited By