Shian Su, Lucinda Xiao, James Lancaster, Tamara Cameron, Kelsey Breslin, Peter F Hickey, Marnie E Blewitt, Quentin Gouil, Matthew E Ritchie
Long-read sequencing technologies have transformed the field of epigenetics by enabling direct, single-base resolution detection of DNA modifications, such as methylation. This produces novel opportunities for studying the role of DNA methylation in gene regulation, imprinting, and disease. However, the unique characteristics of long-read data, including the modBAM format and extended read lengths, necessitate the development of specialised software tools for effective analysis. The NanoMethViz package provides a suite of tools for loading in long-read methylation data, visualising data at various data resolutions. It can convert the data for use with other Bioconductor software such as bsseq, DSS, dmrseq and edgeR to discover differentially methylated regions (DMRs). In this workflow article, we demonstrate the process of converting modBAM files into formats suitable for comprehensive downstream analysis. We leverage NanoMethViz to conduct an exploratory analysis, visually summarizing differences between samples, examining aggregate methylation profiles across gene and CpG islands, and investigating methylation patterns within specific regions at the single-read level. Additionally, we illustrate the use of dmrseq for identifying DMRs and show how to integrate these findings into gene-level visualization plots. Our analysis is applied to a triplicate dataset of haplotyped long-read methylation data from mouse neural stem cells, allowing us to visualize and compare the characteristics of the parental alleles on chromosome 7. By applying DMR analysis, we recover DMRs associated with known imprinted genes and visualise the methylation patterns of these genes summarised at single-read resolution. Through DMR analysis, we identify DMRs associated with known imprinted genes and visualize their methylation patterns at single-read resolution. This streamlined workflow is adaptable to common experimental designs and offers flexibility in the choice of upstream data sources and downstream statistical analysis tools.