SigAlign: an alignment algorithm guided by explicit similarity criteria
(database[TitleAbstract]) AND (Nucleic acids research[Journal]) 2024-11-12
Nucleic Acids Res. 2024 Aug 27;52(15):8717-8733. doi: 10.1093/nar/gkae607.
ABSTRACT
In biological sequence alignment, prevailing heuristic aligners achieve high-throughput by several approximation techniques, but at the cost of sacrificing the clarity of output criteria and creating complex parameter spaces. To surmount these challenges, we introduce 'SigAlign', a novel alignment algorithm that employs two explicit cutoffs for the results: minimum length and maximum penalty per length, alongside three affine gap penalties. Comparative analyses of SigAlign against leading database search tools (BLASTn, MMseqs2) and read mappers (BWA-MEM, bowtie2, HISAT2, minimap2) highlight its performance in read mapping and database searches. Our research demonstrates that SigAlign not only provides high sensitivity with a non-heuristic approach, but also surpasses the throughput of existing heuristic aligners, particularly for high-accuracy reads or genomes with few repetitive regions. As an open-source library, SigAlign is poised to become a foundational component to provide a transparent and customizable alignment process to new analytical algorithms, tools and pipelines in bioinformatics.
PMID:39011889 | PMC:PMC11347165 | DOI:10.1093/nar/gkae607