GSearch: ultra-fast and scalable genome search by combining K-mer hashing with hierarchical navigable small world graphs
(database[TitleAbstract]) AND (Nucleic acids research[Journal]) 2024-11-12
Summary:
Genome search and/or classification typically involves finding the best-match database (reference) genomes and has become increasingly challenging due to the growing number of available database genomes and the fact that traditional methods do not scale well with large databases. By combining k-mer hashing-based probabilistic data structures (i.e. ProbMinHash, SuperMinHash, Densified MinHash and SetSketch) to estimate genomic distance, with a graph based nearest neighbor search algorithm...