Building a Science Gateway For Processing and Modeling Sequencing Data Via Apache Airavata.

Zhong Wang, Marcus A Christie, Eroma Abeysinghe, Tinyi Chu, Suresh Marru, Marlon Pierce, Charles G Danko
Author Information
  1. Zhong Wang: Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, zw355@cornell.edu.
  2. Marcus A Christie: Science Gateways Research Center, Pervasive Technology Institute, Indiana University, Bloomington, IN, machrist@iu.edu.
  3. Eroma Abeysinghe: Science Gateways Research Center, Pervasive Technology Institute, Indiana University, Bloomington, IN, eabeysin@iu.edu.
  4. Tinyi Chu: Graduate field of Computational Biology, Cornell University, Ithaca, NY, tc532@cornell.edu.
  5. Suresh Marru: Science Gateways Research Center, Pervasive Technology Institute, Indiana University, Bloomington, IN, smarru@iu.edu.
  6. Marlon Pierce: Science Gateways Research Center, Pervasive Technology Institute, Indiana University, Bloomington, IN, marpierc@iu.edu.
  7. Charles G Danko: Baker Institute for Animal Health and Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY, cgd24@cornell.edu.

Abstract

The amount of DNA sequencing data has been exponentially growing during the past decade due to advances in sequencing technology. Processing and modeling large amounts of sequencing data can be computationally intractable for desktop computing platforms. High performance computing (HPC) resources offer advantages in terms of computing power, and can be a general solution to these problems. Using HPCs directly for computational needs requires skilled users who know their way around HPCs and acquiring such skills take time. Science gateways acts as the middle layer between users and HPCs, providing users with the resources to accomplish compute-intensive tasks without requiring specialized expertise. We developed a web-based computing platform for genome biologists by customizing the PHP Gateway for Airavata (PGA) framework that accesses publicly accessible HPC resources via Apache Airavata. This web computing platform takes advantage of the Extreme Science and Engineering Discovery Environment (XSEDE) which provides the resources for gateway development, including access to CPU, GPU, and storage resources. We used this platform to develop a gateway for the dREG algorithm, an online computing tool for finding functional regions in mammalian genomes using nascent RNA sequencing data. The dREG gateway provides its users a free, powerful and user-friendly GPU computing resource based on XSEDE, circumventing the need of specialized knowledge about installation, configuration, and execution on an HPC for biologists. The dREG gateway is available at: https://dREG.dnasequence.org/.

Keywords

References

  1. Bioinformatics. 2016 Jun 15;32(12):1832-9 [PMID: 26873929]
  2. Curr Opin Biotechnol. 2012 Feb;23(1):72-6 [PMID: 22227326]
  3. Nucleic Acids Res. 2012 Jul;40(Web Server issue):W622-7 [PMID: 22684630]
  4. Bioinformatics. 2016 Sep 1;32(17):i639-i648 [PMID: 27587684]
  5. Nucleic Acids Res. 2016 Jul 8;44(W1):W3-W10 [PMID: 27137889]
  6. Nat Biotechnol. 2016 May;34(5):525-7 [PMID: 27043002]
  7. Nat Biotechnol. 2015 Aug;33(8):831-8 [PMID: 26213851]
  8. Nat Methods. 2015 May;12(5):433-8 [PMID: 25799441]
  9. Science. 2008 Dec 19;322(5909):1845-8 [PMID: 19056941]
  10. PLoS One. 2015 Nov 10;10(11):e0141287 [PMID: 26555596]
  11. Genome Biol. 2015 Aug 07;16:158 [PMID: 26248465]
  12. Nat Rev Genet. 2015 Jun;16(6):321-32 [PMID: 25948244]
  13. Sci Rep. 2016 Jan 11;6:18962 [PMID: 26752681]
  14. Evol Bioinform Online. 2010 Dec 22;6:197-203 [PMID: 21258651]
  15. Genome Res. 2002 Jun;12(6):996-1006 [PMID: 12045153]
  16. Science. 2013 Feb 22;339(6122):950-3 [PMID: 23430654]
  17. Cancer Res. 2017 Nov 1;77(21):e7-e10 [PMID: 29092928]
  18. Genomics Inform. 2014 Mar;12(1):2-11 [PMID: 24748856]
  19. Nat Biotechnol. 2015 Apr;33(4):345-6 [PMID: 25690851]
  20. Genome Res. 2019 Feb;29(2):293-303 [PMID: 30573452]
  21. Bioinformatics. 2017 Jan 15;33(2):227-234 [PMID: 27663494]
  22. Trends Biotechnol. 2017 Jun;35(6):486-489 [PMID: 28363406]
  23. Nat Methods. 2017 Apr;14(4):417-419 [PMID: 28263959]
  24. Bioinformatics. 2015 Jan 15;31(2):265-7 [PMID: 25270639]

Grants

  1. R01 HG009309/NHGRI NIH HHS

Word Cloud

Created with Highcharts 10.0.0computingsequencingresourcesgatewaydatausersScienceAiravataHPCHPCsplatformApachedREGProcessingcanspecializedbiologistsGatewayXSEDEprovidesGPUSequencingamountDNAexponentiallygrowingpastdecadedueadvancestechnologymodelinglargeamountscomputationallyintractabledesktopplatformsHighperformanceofferadvantagestermspowergeneralsolutionproblemsUsingdirectlycomputationalneedsrequiresskilledknowwayaroundacquiringskillstaketimegatewaysactsmiddlelayerprovidingaccomplishcompute-intensivetaskswithoutrequiringexpertisedevelopedweb-basedgenomecustomizingPHPPGAframeworkaccessespubliclyaccessibleviawebtakesadvantageExtremeEngineeringDiscoveryEnvironmentdevelopmentincludingaccessCPUstorageuseddevelopalgorithmonlinetoolfindingfunctionalregionsmammaliangenomesusingnascentRNAfreepowerfuluser-friendlyresourcebasedcircumventingneedknowledgeinstallationconfigurationexecutionavailableat:https://dREGdnasequenceorg/BuildingModelingDataViaNextGenerationcloudsoftware-as-a-service

Similar Articles

Cited By (2)