Efficient hybrid de novo assembly of human genomes with WENGAN

Archive ouverte

Genova, Alex, Di | Buena-Atienza, Elena | Ossowski, Stephan | Sagot, Marie-France

Edité par CCSD ; Nature Publishing Group -

International audience. Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides highest quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50:17.24-80.64 Mb), few assembly errors (contig NGA50:11.8-59.59 Mb), good consensus quality (QV:27.84-42.88), and high gene completeness (BUSCO complete: 94.6-95.2%), while consuming low computational resources (CPU hours:187-1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50:59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50:57.88 Mb). This is a post-peer-review, pre-copyedit version of an article published in Nature Biotechnology.

Suggestions

Du même auteur

Fast-SG: an alignment-free algorithm for hybrid assembly

Archive ouverte | Genova, Alex, Di | CCSD

International audience. Background: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly...

Mycoplasma hyopneumoniae J elicits an antioxidant response and decreases the expression of ciliary genes in infected swine epithelial cells

Archive ouverte | Mucha, Scheila, Gabriele | CCSD

International audience. Mycoplasma hyopneumoniae is the most costly pathogen for swine production. Although several studies have focused on the host-bacterium association, little is known about the changes in gene e...

Comparing genomic signatures of domestication in two Atlantic salmon (Salmo salar L.) populations with different geographical origins

Archive ouverte | López, Maria, E | CCSD

International audience. Selective breeding and genetic improvement have left detectable signatures on the genomes of domestic species. The elucidation of such signatures is fundamental for detecting genomic regions ...

Chargement des enrichissements...