Formation of human long intergenic non-coding RNA genes, pseudogenes, and protein genes: Ancestral sequences are key players.

Nicholas Delihas
Author Information
  1. Nicholas Delihas: Department of Molecular Genetics and Microbiology, Renaissance School of Medicine, Stony Brook University, Stony Brook, N.Y., United States of America. ORCID

Abstract

Pathways leading to formation of non-coding RNA and protein genes are varied and complex. We report finding a conserved repeat sequence present in human and chimpanzee genomes that appears to have originated from a common primate ancestor. This sequence is repeatedly copied in human chromosome 22 (chr22) low copy repeats (LCR22) or segmental duplications and forms twenty-one different genes, which include the human long intergenic non-coding RNA (lincRNA) family FAM230, a newly discovered lincRNA gene family termed conserved long intergenic non-coding RNAs (clincRNA), pseudogene families, as well as the gamma-glutamyltransferase (GGT) protein gene family and the RNA pseudogenes that originate from GGT sequences. Of particular interest are the GGT5 and USP18 protein genes that appear to have formed from an homologous repeat sequence that also forms the clincRNA gene family. The data point to ancestral DNA sequences, conserved through evolution and duplicated in humans by chromosomal repeat sequences that may serve as functional genomic elements in the development of diverse genes.

References

  1. Hum Genet. 1987 Jul;76(3):283-6 [PMID: 2885259]
  2. Mol Cell Proteomics. 2014 Feb;13(2):397-406 [PMID: 24309898]
  3. Curr Protoc Bioinformatics. 2016 Jun 20;54:1.30.1-1.30.33 [PMID: 27322403]
  4. Nucleic Acids Res. 2019 Jan 8;47(D1):D786-D792 [PMID: 30304474]
  5. Genome Res. 2007 Apr;17(4):451-60 [PMID: 17284672]
  6. Proc Natl Acad Sci U S A. 2016 May 10;113(19):E2617-26 [PMID: 27114548]
  7. Biosci Rep. 2018 Nov 15;38(6): [PMID: 30126853]
  8. BMC Genomics. 2011 Jan 26;12:71 [PMID: 21269513]
  9. BMC Genomics. 2017 Oct 16;18(1):786 [PMID: 29037146]
  10. Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]
  11. Pak J Biol Sci. 2007 Jan 15;10(2):294-301 [PMID: 19070031]
  12. Nat Rev Dis Primers. 2015 Nov 19;1:15071 [PMID: 27189754]
  13. Hum Genet. 2008 May;123(4):321-32 [PMID: 18357469]
  14. Am J Hum Genet. 2018 Feb 1;102(2):207-218 [PMID: 29357977]
  15. Genome Biol. 2015 Jun 16;16:126 [PMID: 26076956]
  16. Genome Res. 2019 Sep;29(9):1389-1401 [PMID: 31481461]
  17. Genome Biol. 2017 Aug 30;18(1):162 [PMID: 28854954]
  18. PLoS Genet. 2012 Sep;8(9):e1002942 [PMID: 23028352]
  19. Noncoding RNA. 2018 Jul 20;4(3): [PMID: 30036931]
  20. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63 [PMID: 24259432]
  21. Nat Ecol Evol. 2017;1(3):69 [PMID: 28580430]
  22. Genome Biol. 2019 May 20;20(1):99 [PMID: 31109370]
  23. Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72 [PMID: 24217909]
  24. Trends Genet. 2017 Oct;33(10):660-662 [PMID: 28778681]
  25. Database (Oxford). 2017 Jan 1;2017(1): [PMID: 28365736]
  26. RNA. 2012 Apr;18(4):825-43 [PMID: 22361292]
  27. Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761 [PMID: 29155950]
  28. PLoS One. 2018 Apr 18;13(4):e0195702 [PMID: 29668722]
  29. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402 [PMID: 9254694]
  30. Proc Natl Acad Sci U S A. 1991 Aug 15;88(16):7361-5 [PMID: 1678521]
  31. Nat Rev Genet. 2016 Oct;17(10):601-14 [PMID: 27573374]
  32. Nucleic Acids Res. 2018 Sep 28;46(17):8720-8729 [PMID: 29986053]
  33. Cell. 2012 May 11;149(4):912-22 [PMID: 22559943]
  34. Int J Mol Sci. 2019 Mar 04;20(5): [PMID: 30836598]
  35. Genome Res. 2003 Dec;13(12):2519-32 [PMID: 14656960]
  36. Annu Rev Genet. 2013;47:307-33 [PMID: 24050177]
  37. Nucleic Acids Res. 2019 Jan 8;47(D1):D221-D229 [PMID: 30395267]

MeSH Term

Animals
Carrier Proteins
Chromosome Mapping
Conserved Sequence
DNA Transposable Elements
Evolution, Molecular
Humans
Pan troglodytes
Proteins
Pseudogenes
RNA, Long Noncoding
gamma-Glutamyltransferase

Chemicals

Carrier Proteins
DNA Transposable Elements
Proteins
RNA, Long Noncoding
gamma-Glutamyltransferase

Word Cloud

Created with Highcharts 10.0.0genesnon-codingRNAproteinhumanfamilysequencesconservedrepeatsequencelongintergenicgeneformslincRNAclincRNAGGTpseudogenesPathwaysleadingformationvariedcomplexreportfindingpresentchimpanzeegenomesappearsoriginatedcommonprimateancestorrepeatedlycopiedchromosome22chr22lowcopyrepeatsLCR22segmentalduplicationstwenty-onedifferentincludeFAM230newlydiscoveredtermedRNAspseudogenefamilieswellgamma-glutamyltransferaseoriginateparticularinterestGGT5USP18appearformedhomologousalsodatapointancestralDNAevolutionduplicatedhumanschromosomalmayservefunctionalgenomicelementsdevelopmentdiverseFormationgenes:Ancestralkeyplayers

Similar Articles

Cited By