Data to knowledge in action: A longitudinal analysis of GenBank metadata.

Jeff Hemsley, Jian Qin, Sarah E Bratt
Author Information
  1. Jeff Hemsley: Syracuse University, Syracuse, New York.
  2. Jian Qin: Syracuse University, Syracuse, New York.
  3. Sarah E Bratt: Syracuse University, Syracuse, New York.


Studies typically use publication-based authorship data to study the relationships between collaboration networks and knowledge diffusion. However, collaboration in research often starts long before publication with data production efforts. In this project we ask how collaboration in data production networks affects and contributes to knowledge diffusion, as represented by patents, another form of knowledge diffusion. We drew our data from the metadata associated with genetic sequence records stored in the National Institutes of Health's GenBank database. After constructing networks for each year and aggregating summary statistics, regressions were used to test several hypotheses. Key among our findings is that data production team size is positively related to the number of patents each year. Also, when actors on average have more links, we tend to see more patents. Our study contributes in the area of science of science by highlighting the important role of data production in the diffusion of knowledge as measured by patents.



  1. Trends Genet. 2008 Mar;24(3):133-41 [PMID: 18262675]
  2. Proc Natl Acad Sci U S A. 2002 Jun 11;99(12):7821-6 [PMID: 12060727]
  3. Phys Rev E Stat Nonlin Soft Matter Phys. 2001 Jul;64(1 Pt 2):016131 [PMID: 11461355]
  4. Proc Natl Acad Sci U S A. 2001 Jan 16;98(2):404-9 [PMID: 11149952]
  5. Nature. 2019 Feb;566(7744):378-382 [PMID: 30760923]
  6. Phys Rev E Stat Nonlin Soft Matter Phys. 2001 Jul;64(1 Pt 2):016132 [PMID: 11461356]


  1. R01 GM137409/NIGMS NIH HHS

Word Cloud

Created with Highcharts 10.0.0dataknowledgediffusioncollaborationnetworksproductionpatentsmetadatastudycontributesGenBankyearscienceStudiestypicallyusepublication-basedauthorshiprelationshipsHoweverresearchoftenstartslongpublicationeffortsprojectaskaffectsrepresentedanotherformdrewassociatedgeneticsequencerecordsstoredNationalInstitutesHealth'sdatabaseconstructingaggregatingsummarystatisticsregressionsusedtestseveralhypothesesKeyamongfindingsteamsizepositivelyrelatednumberAlsoactorsaveragelinkstendseeareahighlightingimportantrolemeasuredDataaction:longitudinalanalysisauthorsanalyticsscientometricmeasures

Similar Articles

Cited By (2)