Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Minimum Spanning vs. Principal Trees for Structured Approximations of Multi-Dimensional Datasets

Authors:	Alexander Chervov Jonathan Bac Andrei Zinovyev

Institution:	1.Institut Curie, PSL Research University, F-75005 Paris, France;2.Institut national de la santé et de la recherche médicale, U900, F-75005 Paris, France;3.CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France;4.Centre de Recherches Interdisciplinaires, Université de Paris, F-75000 Paris, France;5.Lobachevsky University, 603000 Nizhny Novgorod, Russia

Abstract:	Construction of graph-based approximations for multi-dimensional data point clouds is widely used in a variety of areas. Notable examples of applications of such approximators are cellular trajectory inference in single-cell data analysis, analysis of clinical trajectories from synchronic datasets, and skeletonization of images. Several methods have been proposed to construct such approximating graphs, with some based on computation of minimum spanning trees and some based on principal graphs generalizing principal curves. In this article we propose a methodology to compare and benchmark these two graph-based data approximation approaches, as well as to define their hyperparameters. The main idea is to avoid comparing graphs directly, but at first to induce clustering of the data point cloud from the graph approximation and, secondly, to use well-established methods to compare and score the data cloud partitioning induced by the graphs. In particular, mutual information-based approaches prove to be useful in this context. The induced clustering is based on decomposing a graph into non-branching segments, and then clustering the data point cloud by the nearest segment. Such a method allows efficient comparison of graph-based data approximations of arbitrary topology and complexity. The method is implemented in Python using the standard scikit-learn library which provides high speed and efficiency. As a demonstration of the methodology we analyse and compare graph-based data approximation methods using synthetic as well as real-life single cell datasets.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏