Assessing reproducibility of matrix factorization methods in independent transcriptomes

Archive ouverte

Cantini, Laura | Kairov, Ulykbek | de Reyniès, Aurélien | Barillot, Emmanuel | Radvanyi, François | Zinovyev, Andrei

Edité par CCSD ; Oxford University Press (OUP) -

International audience. MOTIVATION: Matrix factorization (MF) methods are widely used in order to reduce dimensionality of transcriptomic datasets to the action of few hidden factors (metagenes). MF algorithms have never been compared based on the between-datasets reproducibility of their outputs in similar independent datasets. Lack of this knowledge might have a crucial impact when generalizing the predictions made in a study to others.RESULTS: We systematically test widely-used MF methods on several transcriptomic datasets collected from the same cancer type (14 colorectal, 8 breast and 4 ovarian cancer transcriptomic datasets). Inspired by concepts of evolutionary bioinformatics, we design a novel framework based on Reciprocally Best Hit (RBH) graphs in order to benchmark the MF methods for their ability to produce generalizable components. We show that a particular protocol of application of Independent Component Analysis (ICA), accompanied by a stabilization procedure, leads to a significant increase in the between-datasets reproducibility. Moreover, we show that the signals detected through this method are systematically more interpretable than those of other standard methods. We developed a user-friendly tool BIODICA for performing the Stabilized ICA-based RBH meta-analysis. We apply this methodology to the study of colorectal cancer (CRC) for which 14 independent transcriptomic datasets can be collected. The resulting RBH graph maps the landscape of interconnected factors associated to biological processes or to technological artifacts. These factors can be used as clinical biomarkers or robust and tumor-type specific transcriptomic signatures of tumoral cells or tumoral microenvironment. Their intensities in different samples shed light on the mechanistic basis of CRC molecular subtyping.

Suggestions

Du même auteur

Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets

Archive ouverte | Sompairac, Nicolas | CCSD

International audience. Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possibl...

Determining the optimal number of independent components for reproducible transcriptomic data analysis

Archive ouverte | Kairov, Ulykbek | CCSD

International audience. BACKGROUND: Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a...

Independent Component Analysis Uncovers the Landscape of the Bladder Tumor Transcriptome and Reveals Insights into Luminal and Basal Subtypes

Archive ouverte | Biton, Anne | CCSD

International audience. Extracting relevant information from large-scale data offers unprecedented opportunities in cancerology. We applied independent component analysis (ICA) to bladder cancer transcriptome data s...

Chargement des enrichissements...