Selection for energy efficiency drives strand-biased gene distribution in prokaryotes.

Na Gao, Guanting Lu, Martin J Lercher, Wei-Hua Chen
Author Information
  1. Na Gao: Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), 430074, Wuhan, Hubei, China.
  2. Guanting Lu: Department of Blood Transfusion, Tangdu Hospital, the Fourth Military Medical University, No 1, Xinsi Road, Chanba District, 710000, Xi'an, China.
  3. Martin J Lercher: Institute for Computer Science and Cluster of Excellence on Plant Sciences CEPLAS, Heinrich Heine University, 40225, Düsseldorf, Germany. ORCID
  4. Wei-Hua Chen: Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology (HUST), 430074, Wuhan, Hubei, China. weihuachen@hust.edu.cn.

Abstract

Lagging-strand genes accumulate more deleterious mutations. Genes are thus preferably located on the leading strand, an observation known as strand-biased gene distribution (SGD). Despite of this mechanistic understanding, a satisfactory quantitative model is still lacking. Replication-transcription-collisions induce stalling of the replication machinery, expose DNA to various attacks, and are followed by error-prone repairs. We found that mutational biases in non-transcribed regions can explain ~71% of the variations in SGDs in 1,552 genomes, supporting the mutagenesis origin of SGD. Mutational biases introduce energetically cheaper nucleotides on the lagging strand, and result in more expensive protein products; consistently, the cost difference between the two strands explains ~50% of the variance in SGDs. Protein costs decrease with increasing gene expression. At similar expression levels, protein products of leading-strand genes are generally cheaper than lagging-strand genes; however, highly-expressed lagging genes are still cheaper than lowly-expressed leading genes. Selection for energy efficiency thus drives some genes to the leading strand, especially those highly expressed and essential, but certainly not all genes. Stronger mutational biases are often associated with low-GC genomes; as low-GC genes encode expensive proteins, low-GC genomes thus tend to have stronger SGDs to alleviate the stronger pressure on efficient energy usage.

References

  1. J Mol Evol. 1998 Dec;47(6):691-6 [PMID: 9847411]
  2. Trends Microbiol. 2002 Sep;10(9):393-5 [PMID: 12217498]
  3. Science. 2012 Mar 2;335(6072):1103-6 [PMID: 22383849]
  4. Nucleic Acids Res. 2003 Nov 15;31(22):6570-7 [PMID: 14602916]
  5. J Mol Evol. 2012 Apr;74(3-4):206-16 [PMID: 22538926]
  6. PLoS One. 2013 Aug 15;8(8):e72343 [PMID: 23977285]
  7. Database (Oxford). 2014 Jun 11;2014:null [PMID: 24923821]
  8. Genomics Proteomics Bioinformatics. 2012 Aug;10(4):186-96 [PMID: 23084774]
  9. J Mol Evol. 2007 May;64(5):558-71 [PMID: 17476453]
  10. Mol Cell Biol. 2005 Feb;25(3):888-95 [PMID: 15657418]
  11. Proc Natl Acad Sci U S A. 2002 Mar 19;99(6):3695-700 [PMID: 11904428]
  12. Nucleic Acids Res. 2012 Sep 1;40(17):8210-8 [PMID: 22735706]
  13. Nucleic Acids Res. 2009 Jan;37(Database issue):D459-63 [PMID: 18988623]
  14. Nucleic Acids Res. 1997 Mar 1;25(5):955-64 [PMID: 9023104]
  15. Nat Genet. 2003 Aug;34(4):377-8 [PMID: 12847524]
  16. Genomics. 2007 Aug;90(2):186-94 [PMID: 17532183]
  17. Nucleic Acids Res. 2004 Sep 24;32(17):5036-44 [PMID: 15448185]
  18. Nucleic Acids Res. 2013 Jan;41(Database issue):D90-3 [PMID: 23093601]
  19. Nature. 2016 Jul 7;535(7610):178-81 [PMID: 27362223]
  20. Science. 1995 Feb 24;267(5201):1131-7 [PMID: 7855590]
  21. PLoS Biol. 2007 Jan;5(1):e8 [PMID: 17214507]
  22. Proc Natl Acad Sci U S A. 2007 Mar 27;104(13):5608-13 [PMID: 17372224]
  23. Annu Rev Genet. 2008;42:211-33 [PMID: 18605898]
  24. Sci Rep. 2015 Nov 12;5:16431 [PMID: 26560889]
  25. Annu Rev Biochem. 1980;49:421-57 [PMID: 6250445]
  26. PLoS Genet. 2010 Sep 09;6(9):e1001115 [PMID: 20838599]
  27. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42 [PMID: 23193287]
  28. Cell. 1988 Jun 3;53(5):679-86 [PMID: 3286014]
  29. Nucleic Acids Res. 2005 Jun 07;33(10):3224-34 [PMID: 15942025]
  30. Nature. 2013 Mar 28;495(7442):512-5 [PMID: 23538833]
  31. Proc Natl Acad Sci U S A. 2015 Mar 10;112(10):E1096-105 [PMID: 25713353]
  32. Mol Biol Evol. 2006 Sep;23(9):1670-80 [PMID: 16754641]
  33. Nucleic Acids Res. 2017 Jan 4;45(D1):D940-D944 [PMID: 27799467]
  34. Genome Biol Evol. 2013;5(12):2436-9 [PMID: 24273314]
  35. Nucleic Acids Res. 2013 Jan;41(Database issue):D991-5 [PMID: 23193258]
  36. Mol Microbiol. 2007 Apr;64(1):42-56 [PMID: 17376071]
  37. Nucleic Acids Res. 2003 Dec 1;31(23):6976-85 [PMID: 14627830]
  38. Genomics. 2007 Dec;90(6):733-40 [PMID: 17920810]
  39. Nucleic Acids Res. 2016 Feb 18;44(3):1192-202 [PMID: 26773059]
  40. PLoS One. 2017 Feb 3;12 (2):e0171408 [PMID: 28158313]
  41. Bioinformatics. 2004 Nov 1;20(16):2719-25 [PMID: 15145803]
  42. Nat Commun. 2016 Apr 21;7:11334 [PMID: 27098217]
  43. Sci Adv. 2016 Mar 04;2(3):e1501363 [PMID: 26973873]

MeSH Term

Bacillus subtilis
Energy Metabolism
Escherichia coli
GC Rich Sequence
Gene Expression Regulation, Bacterial
Genome, Bacterial
Mutation Rate
Mycoplasma pneumoniae
Selection, Genetic

Word Cloud

Created with Highcharts 10.0.0genesthusleadingstrandgenebiasesSGDsgenomescheaperenergylow-GCstrand-biaseddistributionSGDstillmutationallaggingexpensiveproteinproductsexpressionSelectionefficiencydrivesstrongerLagging-strandaccumulatedeleteriousmutationsGenespreferablylocatedobservationknownDespitemechanisticunderstandingsatisfactoryquantitativemodellackingReplication-transcription-collisionsinducestallingreplicationmachineryexposeDNAvariousattacksfollowederror-pronerepairsfoundnon-transcribedregionscanexplain~71%variations1552supportingmutagenesisoriginMutationalintroduceenergeticallynucleotidesresultconsistentlycostdifferencetwostrandsexplains~50%varianceProteincostsdecreaseincreasingsimilarlevelsleading-strandgenerallylagging-strandhoweverhighly-expressedlowly-expressedespeciallyhighlyexpressedessentialcertainlyStrongeroftenassociatedencodeproteinstendalleviatepressureefficientusageprokaryotes

Similar Articles

Cited By