TY - JOUR
T1 - GapSense
T2 - Similarity Estimation-Based Gap Filler with TGS-Reads for Genome Assemblies
AU - Kan, Yejin
AU - Kim, Dongyeon
AU - Yang, Jinkyung
AU - Yi, Gangman
N1 - Publisher Copyright:
© International Association of Scientists in the Interdisciplinary Areas 2025.
PY - 2025
Y1 - 2025
N2 - Abstract: Advances in next-generation sequencing have led to an explosion in sequencing data, accelerating genome assembly research. However, draft genomes generated after scaffolding still contain unresolved gaps, often caused by repetitive regions and sequencing errors. These gaps may contain biologically meaningful sequences and thus require accurate resolution. However, existing gap-filling tools often exhibit limited reliability, especially when applied to large and complex eukaryotic genomes, due to their insufficient capacity to resolve repetitive regions or their heavy dependence on error-prone long reads. To address this challenge, we present GapSense, a robust gap-filling method that leverages similarity estimation using third-generation sequencing (TGS) reads. By quantifying pairwise similarity among candidate sequences, GapSense prioritizes informative regions and reconstructs gap sequences with higher accuracy. The proposed method introduces a novel similarity scoring mechanism that evaluates the geometric overlap of adjacent subregions to capture local structural variations and reduces noise from low-coverage and error-prone long reads. Experimental results on six representative species and three popular assemblers show that GapSense consistently outperforms existing tools in terms of gap-filling accuracy and contiguity, while maintaining low performance variability across different datasets. These findings demonstrate the effectiveness and generalizability of GapSense for accurate and scalable gap-filling.
AB - Abstract: Advances in next-generation sequencing have led to an explosion in sequencing data, accelerating genome assembly research. However, draft genomes generated after scaffolding still contain unresolved gaps, often caused by repetitive regions and sequencing errors. These gaps may contain biologically meaningful sequences and thus require accurate resolution. However, existing gap-filling tools often exhibit limited reliability, especially when applied to large and complex eukaryotic genomes, due to their insufficient capacity to resolve repetitive regions or their heavy dependence on error-prone long reads. To address this challenge, we present GapSense, a robust gap-filling method that leverages similarity estimation using third-generation sequencing (TGS) reads. By quantifying pairwise similarity among candidate sequences, GapSense prioritizes informative regions and reconstructs gap sequences with higher accuracy. The proposed method introduces a novel similarity scoring mechanism that evaluates the geometric overlap of adjacent subregions to capture local structural variations and reduces noise from low-coverage and error-prone long reads. Experimental results on six representative species and three popular assemblers show that GapSense consistently outperforms existing tools in terms of gap-filling accuracy and contiguity, while maintaining low performance variability across different datasets. These findings demonstrate the effectiveness and generalizability of GapSense for accurate and scalable gap-filling.
KW - De novo assembly
KW - Gap-filling
KW - Genome assembly
KW - Next-generation sequencing
KW - Third-generation sequencing
UR - https://www.scopus.com/pages/publications/105021117463
U2 - 10.1007/s12539-025-00770-y
DO - 10.1007/s12539-025-00770-y
M3 - Article
AN - SCOPUS:105021117463
SN - 1913-2751
JO - Interdisciplinary Sciences - Computational Life Sciences
JF - Interdisciplinary Sciences - Computational Life Sciences
ER -