Broadly sampled orthologous groups of eukaryotic proteins for the phylogenetic study of plastid-bearing lineages

Archive ouverte

van Vlierberghe, Mick | Philippe, Hervé | Baurain, Denis

Edité par CCSD ; BioMed Central -

International audience. Objectives: Identifying orthology relationships among sequences is essential to understand evolution, diversity of life and ancestry among organisms. To build alignments of orthologous sequences, phylogenomic pipelines often start with all-vs-all similarity searches, followed by a clustering step. For the protein clusters (orthogroups) to be as accurate as possible, proteomes of good quality are needed. Here, our objective is to assemble a data set especially suited for the phylogenomic study of algae and formerly photosynthetic eukaryotes, which implies the proper integration of organellar data, to enable distinguishing between several copies of one gene (paralogs), taking into account their cellular compartment, if necessary. Data description: We submitted 73 top-quality and taxonomically diverse proteomes to OrthoFinder. We obtained 47,266 orthogroups and identified 11,775 orthogroups with at least two algae. Whenever possible, sequences were functionally annotated with eggNOG and tagged after their genomic and target compartment(s). Then we aligned and computed phylogenetic trees for the orthogroups with IQ-TREE. Finally, these trees were further processed by identifying and pruning the subtrees exclusively composed of plastid-bearing organisms to yield a set of 31,784 clans suitable for studying photosynthetic organism genome evolution.

Suggestions

Du même auteur

Decontamination, pooling and dereplication of the 678 samples of the Marine Microbial Eukaryote Transcriptome Sequencing Project

Archive ouverte | van Vlierberghe, Mick | CCSD

International audience. Objectives: Complex algae are photosynthetic organisms resulting from eukaryote-to-eukaryote endosymbioticlike interactions. Yet the specific lineages and mechanisms are still under debate. T...

Consensus assessment of the contamination level of publicly available cyanobacterial genomes

Archive ouverte | Cornet, Luc | CCSD

International audience. Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially i...

Lower statistical support with larger datasets: insights from the Ochrophyta radiation

Archive ouverte | Di Franco, Arnaud | CCSD

International audience. It is commonly assumed that increasing the number of characters has the potential to resolve evolutionary radiations. Here, we studied photosynthetic stramenopiles (Ochrophyta) using alignmen...

Chargement des enrichissements...