Fast-SG: an alignment-free algorithm for hybrid assembly

Archive ouverte

Genova, Alex, Di | Ruz, Gonzalo, A | Sagot, Marie-France | Maass, Alejandro

Edité par CCSD ; Oxford Univ Press -

International audience. Background: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short-and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. Results: Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using lightweight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffolding graph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). Conclusions: Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.

Suggestions

Du même auteur

Efficient hybrid de novo assembly of human genomes with WENGAN

Archive ouverte | Genova, Alex, Di | CCSD

International audience. Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurat...

Mycoplasma hyopneumoniae J elicits an antioxidant response and decreases the expression of ciliary genes in infected swine epithelial cells

Archive ouverte | Mucha, Scheila, Gabriele | CCSD

International audience. Mycoplasma hyopneumoniae is the most costly pathogen for swine production. Although several studies have focused on the host-bacterium association, little is known about the changes in gene e...

Comparing genomic signatures of domestication in two Atlantic salmon (Salmo salar L.) populations with different geographical origins

Archive ouverte | López, Maria, E | CCSD

International audience. Selective breeding and genetic improvement have left detectable signatures on the genomes of domestic species. The elucidation of such signatures is fundamental for detecting genomic regions ...

Chargement des enrichissements...