RT Journal Article SR Electronic T1 New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads JF Life Science Alliance JO Life Sci. Alliance FD Life Science Alliance LLC SP e202201719 DO 10.26508/lsa.202201719 VO 6 IS 5 A1 Gonzalez-Garcia, Laura A1 Guevara-Barrientos, David A1 Lozano-Arce, Daniela A1 Gil, Juanita A1 Díaz-Riaño, Jorge A1 Duarte, Erick A1 Andrade, Germán A1 Bojacá, Juan Camilo A1 Hoyos-Sanchez, Maria Camila A1 Chavarro, Christian A1 Guayazan, Natalia A1 Chica, Luis Alberto A1 Buitrago Acosta, Maria Camila A1 Bautista, Edwin A1 Trujillo, Miller A1 Duitama, Jorge YR 2023 UL https://www.life-science-alliance.org/content/6/5/e202201719.abstract AB Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers selected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reimplementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.