MilkOligoThesaurus, a dataset of mammalian milk oligosaccharide synonyms

Archive ouverte

Rumeau, Mathilde | Fenaille, François | Girard, Agnès | Loux, Valentin | Ba, Mouhamadou | Nédellec, Claire | Deleger, Louise | Bossy, Robert | Aubin, Sophie | Knudsen, Christelle | Combes, Sylvie

Edité par CCSD ; Elsevier -

International audience. There is a growing interest in milk oligosaccharides (MOs) because of their numerous benefits for newborns’ and long-term health. A large number of MO structures have been identified in mammalian milk. Mostly described in human milk, the oligosaccharide richness, although less broad, has also been reported for a wide range of mammalian species. The structure of MOs is particularly difficult to report as it results from the combination of 5 monosaccharides linked by various glycosidic bonds forming structurally diverse and complex matrices of linear and branched oligosaccharides. Exploring the literature and extracting relevant information on MO diversity within or across species appears promising to elucidate structure-function role of MOs. Currently, given the complexity of these molecules, the main issues in exploring literature to extract relevant information on MO diversity within or across species relate to the heterogeneity in the way authors refer to these molecules. Herein, we provide a thesaurus (MilkOligoThesaurus) including the names and synonyms of MOs collected from key selected articles on mammalian milk analyses. MilkOligoThesaurus gathers the names of the MOs with a complete description of their monosaccharide composition and structures. When available, each unique MO molecule is linked to its ID from the NCBI PubChem and ChEBI databases. MilkOligoThesaurus is provided in a tabular format. It gathers 245 unique oligosaccharide structures described by 22 features (columns) including the name of the molecule, its abbreviation, the chemical database IDs if available, the monosaccharide composition, chemical information (molecular formula, monoisotopic mass), synonyms, its formula in condensed form, and in abbreviated condensed form, the abbreviated systematic name, the systematic name, the isomer group, and scientific article sources. MilkOligoThesaurus is also provided in the SKOS (Simple Knowledge Organization System) format. This thesaurus is a valuable resource gathering MO naming variations that are not found elsewhere for (i) Text and Data Mining to enable automatic annotation and rapid extraction of milk oligosaccharide data from scientific papers; (ii) biology researchers aiming to search for or decipher the structure of milk oligosaccharides based on any of their names, abbreviations or monosaccharide compositions and linkages

Suggestions

Du même auteur

MilkOligoCorpus annotation guidelines

Archive ouverte | Rumeau, Mathilde | CCSD

This document describes the guidelines for annotating the MilkOligoCorpus. The goal is to design a corpus to be used for evaluating and training extraction methods of information related to milk oligosaccharides (MO) of different ...

HoloOLIGO corpus, a manually annotated text dataset supporting schema-based relational information extraction for mammalian milk oligosaccharide diversity pattern comprehension

Archive ouverte | Rumeau, Mathilde | CCSD

International audience. Research on milk oligosaccharides (MO) has gained pace in recent years due to the growing evidenceof their numerous health benefits. Many studies have assessed the MO composition in a wide ra...

Florilege : a database gathering microbial phenotypes of food interest

Archive ouverte | Falentin, Hélène | CCSD

Food fermentation and biopreservation processes involve the use of various species and strains of bacteria and yeast. These strains are responsible for the targeted qualities of the food products that are sanitary, organoleptic (a...

Chargement des enrichissements...