期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Rank-based classifiers for extremely high-dimensional gene expression data

Ludwig Lausser Florian Schmid Lyn-Rouven Schirra Adalbert F. X. Wilhelm Hans A. Kestler 《Advances in Data Analysis and Classification》2018,12(4):917-936

Predicting phenotypes on the basis of gene expression profiles is a classification task that is becoming increasingly important in the field of precision medicine. Although these expression signals are real-valued, it is questionable if they can be analyzed on an interval scale. As with many biological signals their influence on e.g. protein levels is usually non-linear and thus can be misinterpreted. In this article we study gene expression profiles with up to 54,000 dimensions. We analyze these measurements on an ordinal scale by replacing the real-valued profiles by their ranks. This type of rank transformation can be used for the construction of invariant classifiers that are not affected by noise induced by data transformations which can occur in the measurement setup. Our 10 $\times $ 10 fold cross-validation experiments on 86 different data sets and 19 different classification models indicate that classifiers largely benefit from this transformation. Especially random forests and support vector machines achieve improved classification results on a significant majority of datasets. 相似文献

2.

On the fusion of threshold classifiers for categorization and dimensionality reduction

Hans A. Kestler Ludwig Lausser Wolfgang Lindner G��nther Palm 《Computational Statistics》2011,26(2):321-340

We study ensembles of simple threshold classifiers for the categorization of high-dimensional data of low cardinality and give a compression bound on their prediction risk. Two approaches are utilized to produce such classifiers. One is based on univariate feature selection employing the area under the ROC curve as ranking criterion. The other approach uses a greedy selection strategy. The methods are applied to artificial data, published microarray expression profiles, and highly imbalanced data. 相似文献

3.

Measuring and visualizing the stability of biomarker selection techniques

Ludwig Lausser Christoph Müssel Markus Maucher Hans A. Kestler 《Computational Statistics》2013,28(1):51-65

Feature selection is an essential step when dealing with high-dimensional data. In a diagnostic setting, marker genes have to be selected for specialized low-dimensional gene expression assays. A meaningful biomarker selection is expected to produce stable results in different resampling settings. We define an index to quantify stability and introduce a statistical testing procedure for stability. We also present new methods of visualizing stability and associating it with the accuracy of a subsequent classification process. 相似文献

4.

Fabrication of EuF3‐Mesocrystals in a Gel Matrix

Christine Lausser Michael U. Kumke Markus Antonietti Prof Dr. Helmut Cölfen 《无机化学与普通化学杂志》2010,636(11):1925-1930

Europium(III) fluoride mesocrystals were synthesised in an organic matrix. This matrix is a gel formed by Eu³⁺ ions and a polycarboxylate/sulfonate copolymer, ACUSOL 588G. In the gel phase, the local amount of europium ions is very high since Eu³⁺ acts as a crosslinker, and crystallisation occurs upon addition of F^–. Nucleated seed crystals in the gel phase grow by further ion attachment and form mesocrystals by mutual orientation of the EuF₃ particles in the gel. We propose a dipole field as reason for this alignment and that the dipolar character of the particles originates from adsorption of the polyelectrolyte on charged crystal faces. 相似文献

5.

A perceptually optimised bivariate visualisation scheme for high-dimensional fold-change data

Müller André Lausser Ludwig Wilhelm Adalbert Ropinski Timo Platzer Matthias Neumann Heiko Kestler Hans A. 《Advances in Data Analysis and Classification》2021,15(2):463-480

Visualising data as diagrams using visual attributes such as colour, shape, size, and orientation is challenging. In particular, large data sets demand graphical display as an essential step in the analysis. In order to achieve comprehension often different attributes need to be displayed simultaneously. In this work a comprehensible bivariate, perceptually optimised visualisation scheme for high-dimensional data is proposed and evaluated. It can be used to show fold changes together with confidence values within a single diagram. The visualisation scheme consists of two parts: a uniform, symmetric, two-sided colour scale and a patch grid representation. Evaluation of uniformity and symmetry of the two-sided colour scale was performed in comparison to a standard RGB scale by twenty-five observers. Furthermore, the readability of the generated map was validated and compared to a bivariate heat map scheme.

相似文献

6.

Chained correlations for feature selection

Lausser Ludwig Szekely Robin Kestler Hans A. 《Advances in Data Analysis and Classification》2020,14(4):871-884

Advances in Data Analysis and Classification - Data-driven algorithms stand and fall with the availability and quality of existing data sources. Both can be limited in high-dimensional settings (... 相似文献

7.

Identifying predictive hubs to condense the training set of k-nearest neighbour classifiers

Ludwig Lausser Christoph Müssel Alexander Melkozerov Hans A. Kestler 《Computational Statistics》2014,29(1-2):81-95

The $k$ -Nearest Neighbour classifier is widely used and popular due to its inherent simplicity and the avoidance of model assumptions. Although the approach has been shown to yield a near-optimal classification performance for an infinite number of samples, a selection of the most decisive data points can improve the classification accuracy considerably in real settings with a limited number of samples. At the same time, a selection of a subset of representative training samples reduces the required amount of storage and computational resources. We devised a new approach that selects a representative training subset on the basis of an evolutionary optimization procedure. This method chooses those training samples that have a strong influence on the correct prediction of other training samples, in particular those that have uncertain labels. The performance of the algorithm is evaluated on different data sets. Additionally, we provide graphical examples of the selection procedure. 相似文献