Qingwen Li, Dongxu Li, Chen Sun, Guangtao Song, Yihua Huang, Jizhong Lou. 2026: Quantitative Full-length transcriptome analysis by nanopore sequencing with Error-Aware UMI mapping. Biophysics Reports. DOI: 10.52601/bpr.2026.260010
Citation: Qingwen Li, Dongxu Li, Chen Sun, Guangtao Song, Yihua Huang, Jizhong Lou. 2026: Quantitative Full-length transcriptome analysis by nanopore sequencing with Error-Aware UMI mapping. Biophysics Reports. DOI: 10.52601/bpr.2026.260010

Quantitative Full-length transcriptome analysis by nanopore sequencing with Error-Aware UMI mapping

  • Comprehensive transcriptome profiling is essential for understanding RNA diversity and regulation, yet accurate identification and quantification of full-length transcript isoforms remain challenging with short-read sequencing technologies. Nanopore sequencing enables direct sequencing of long cDNA molecules and thus offers a powerful solution for full-length transcriptome analysis, but its application to quantitative transcriptomics is limited by PCR amplification bias and the difficulty of unique molecular identifier (UMI) recognition under high sequencing error rates.
    Here, we developed UMImap, a dedicated pipeline for robust UMI identification, error correction, and deduplication in nanopore data. The core principle of UMImap is to leverage transcript-level mapping specificity to stratify high-confidence UMIs from error-prone candidates, followed by alignment-based error correction and similarity-based clustering, thereby substantially improving UMI accuracy under high-error conditions. Compared with existing tools, UMImap demonstrated superior performance which reached the highest proportion of reads mapping uniquely to a single transcript isoform (67.3% and 66.2%). Additionally, UMImap achieved the highest fraction of UMIs supported by ≥2 reads (32.2% and 30.9%), indicating more effective consolidation of reads from the same original molecule. Transcript quantification using UMImap showed high reproducibility between libraries prepared with different PCR cycles, demonstrating effective mitigation of PCR-induced duplication bias. Using this framework, we identified 75,030 full-length transcript isoforms from GM12878 cells, among which 30.32% were previously unannotated. Many of these novelin-catalog isoforms are significantly longer than Ensembl annotations and are enriched in RNA processing, splicing, and translation pathways.
    Our results establish UMImap as an effective solution for UMI-based quantification in nanopore full-length transcriptome sequencing and highlight the potential of long-read sequencing to simultaneously achieve accurate isoform discovery and expression analysis in complex transcriptomes.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return