首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Multispectral images such as multispectral chemical images or multispectral satellite images provide detailed data with information in both the spatial and spectral domains. Many segmentation methods for multispectral images are based on a per-pixel classification, which uses only spectral information and ignores spatial information. A clustering algorithm based on both spectral and spatial information would produce better results.

In this work, spatial refinement clustering (SpaRef), a new clustering algorithm for multispectral images is presented. Spatial information is integrated with partitional and agglomeration clustering processes. The number of clusters is automatically identified. SpaRef is compared with a set of well-known clustering methods on compact airborne spectrographic imager (CASI) over an area in the Klompenwaard, The Netherlands. The clusters obtained show improved results. Applying SpaRef to multispectral chemical images would be a straight-forward step.  相似文献   


2.
The fuzzy C‐means (FCM) algorithm does not fully utilize the spatial information for image segmentation and is sensitive to the presence of noise and intensity inhomogeneity in magnetic resonance imaging (MRI) images. The underlying reason is that using a single fuzzy membership function the FCM algorithm cannot properly represent pattern associations to all clusters. In this paper, we present a modified FCM (mFCM) algorithm by incorporating scale control spatial information for segmentation of MRI images in the presence of high levels of noise and intensity inhomogeneity. The algorithm utilizes scale controlled spatial information from the neighbourhood of each pixel under consideration in the form of a probability function. Using this probability function, a local membership function is introduced for each pixel. Finally, new clustering centre and weighted joint membership functions are introduced based on the local membership and global membership functions. The resulting mFCM algorithm is robust to the noise and intensity inhomogeneity in MRI image data and thereby improves the segmentation results. The experimental results on a synthetic image, four volumes of simulated and one volume of real‐patient MRI brain images show that the mFCM algorithm outperforms k‐means, FCM and some other recently proposed FCM‐based algorithms for image segmentation in terms of qualitative and quantitative studies such as cluster validity functions, segmentation accuracy and tissue segmentation accuracy. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

3.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached.  相似文献   

4.
Representative subset selection   总被引:1,自引:0,他引:1  
Fast development of analytical techniques enable to acquire huge amount of data. Large data sets are difficult to handle and therefore, there is a big interest in designing a subset of the original data set, which preserves the information of the original data set and facilitates the computations. There are many subset selection methods and their choice depends on the problem at hand. The two most popular groups of subset selection methods are uniform designs and cluster-based designs. Among the methods considered in this paper there are uniform designs, such as those proposed by Kennard and Stone, OptiSim, and cluster-based designs applying K-means technique and density based spatial clustering of applications with noise (DBSCAN). Additionally, a new concept of the subset selection with K-means is introduced.  相似文献   

5.
A new image analysis strategy is introduced to determine the composition and the structural characteristics of plant cell walls by combining Raman microspectroscopy and unsupervised data mining methods. The proposed method consists of three main steps: spectral preprocessing, spatial clustering of the image and finally estimation of spectral profiles of pure components and their weights. Point spectra of Raman maps of cell walls were preprocessed to remove noise and fluorescence contributions and compressed with PCA. Processed spectra were then subjected to k-means clustering to identify spatial segregations in the images. Cell wall images were reconstructed with cluster identities and each cluster was represented by the average spectrum of all the pixels in the cluster. Pure components spectra were estimated by spectral entropy minimization criteria with simulated annealing optimization. Two pure spectral estimates that represent lignin and carbohydrates were recovered and their spatial distributions were calculated. Our approach partitioned the cell walls into many sublayers, based on their composition, thus enabling composition analysis at subcellular levels. It also overcame the well known problem that native lignin spectra in lignocellulosics have high spectral overlap with contributions from cellulose and hemicelluloses, thus opening up new avenues for microanalyses of monolignol composition of native lignin and carbohydrates without chemical or mechanical extraction of the cell wall materials.  相似文献   

6.
7.
DNA arrays have become the immediate choice in the analysis of large-scale expression measurements. Understanding the expression pattern of genes provide functional information on newly identified genes by computational approaches. Gene expression pattern is an indicator of the state of the cell, and abnormal cellular states can be inferred by comparing expression profiles. Since co-regulated genes, and genes involved in a particular pathway, tend to show similar expression patterns, clustering expression patterns has become the natural method of choice to differentiate groups. However, most methods based on cluster analysis suffer from the usual problems (i) dead units, and (ii) the problem of determining the correct number of clusters (k) needed to classify the data. Selecting the k has been an open problem of pattern recognition and statistics for decades. Since clustering reveals similar patterns present in the data, fixing this number strongly influences the quality of the result. While there is no theoretical solution to this problem, the number of clusters can be decided by a heuristic clustering algorithm called rival penalized competitive learning (RPCL). We present a novel implementation of RPCL that transforms the correct number of clusters problem to the tractable problem of clustering based on the degree of similarity. This is biologically significant since our implementation clusters functionally co-regulated genes and genes that present similar patterns of expression. This new approach reveals potential genes that are co-involved in a biological process. This implementation of the RPCL algorithm is useful in differentiating groups involved in concerted functional regulation and helps to progressively home into patterns, which are closely similar.  相似文献   

8.
9.
10.
Mass spectrometry imaging (MSI) is widely used for the label-free molecular mapping of biological samples. The identification of co-localized molecules in MSI data is crucial to the understanding of biochemical pathways. One of key challenges in molecular colocalization is that complex MSI data are too large for manual annotation but too small for training deep neural networks. Herein, we introduce a self-supervised clustering approach based on contrastive learning, which shows an excellent performance in clustering of MSI data. We train a deep convolutional neural network (CNN) using MSI data from a single experiment without manual annotations to effectively learn high-level spatial features from ion images and classify them based on molecular colocalizations. We demonstrate that contrastive learning generates ion image representations that form well-resolved clusters. Subsequent self-labeling is used to fine-tune both the CNN encoder and linear classifier based on confidently classified ion images. This new approach enables autonomous and high-throughput identification of co-localized species in MSI data, which will dramatically expand the application of spatial lipidomics, metabolomics, and proteomics in biological research.

Contrastive learning is used to train a deep convolutional neural network to identify high-level features in mass spectrometry imaging data. These features enable self-supervised clustering of ion images without manual annotation.  相似文献   

11.
We propose a method for calculating absolute free energies from Monte Carlo or molecular-dynamics data. The method is based on the identity that expresses the partition function Q as a Boltzmann average: 1Q=w(p,x)exp[betaH(p,x)], where w(p,x) is an arbitrary weight function such that its integral over the phase space is equal to 1. In practice, to minimize statistical errors the weight function is chosen such that the regions of the phase space where sampling statistics are poor are excluded from the average. The "ideal" weight function would be the equilibrium phase-space density exp[-betaH(p,x)]Q itself. We consider two methods for constructing the weight function based on different estimates of the equilibrium phase-space density from simulation data. In the first method, it is chosen to be a Gaussian function, whose parameters are obtained from the covariance matrix of the atomic coordinates. In the second, a clustering algorithm is used to attempt partitioning the data into clusters corresponding to different basins of attraction visited by the system. The weight function is then constructed as a superposition of Gaussians calculated for each cluster separately. We show that these strategies can be used to improve upon previous methods of estimating absolute entropies from covariance matrices.  相似文献   

12.
Targeting cells specifically based on receptor expression levels remains an area of active research to date. Selective binding of receptors cannot be achieved by increasing the individual binding strength, as this does not account for differing distributions of receptor density across healthy and diseased cells. Engaging receptors above a threshold concentration would be desirable in devising selective diagnostics. Integrins are prime target candidates as they are readily available on the cell surface and have been reported to be overexpressed in diseases. Insights into their spatial organization would therefore be advantageous to design selective targeting agents. Here, we investigated the effect of activation method on integrin α5β1 clustering by immunofluorescence and modeled the global neighbor distances with input from an immuno-staining assay and image processing of microscopy images. This data was used to engineer spatially-controlled DNA-scaffolded bivalent ligands, which we used to compare trends in spatial-selective binding observed across HUVEC, CHO and HeLa in resting versus activated conditions in confocal microscopy images. For HUVEC and CHO, the data demonstrated an improved selectivity and localisation of binding for smaller spacings ~7 nm and ~24 nm, in good agreement with the model. A deviation from the mode predictions for HeLa was observed, indicative of a clustered, instead of homogeneous, integrin organization. Our findings demonstrate how low-technology imaging methods can guide the design of spatially controlled ligands to selectively differentiate between cell type and integrin activation state.  相似文献   

13.
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.  相似文献   

14.
Feature selection is a key step in Quantitative Structure Activity Relationship (QSAR) analysis. Chance correlations and multicollinearity are two major problems often encountered when attempting to find generalized QSAR models for use in drug design. Optimal QSAR models require an objective variable relevance analysis step for producing robust classifiers with low complexity and good predictive accuracy. Genetic algorithms coupled with information theoretic approaches such as mutual information have been used to find near-optimal solutions to such multicriteria optimization problems. In this paper, we describe a novel approach for analyzing QSAR data based on these methods. Our experiments with the Thrombin dataset, previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001 demonstrate the feasibility of this approach. It has been found that it is important to take into account the data distribution, the rule "interestingness", and the need to look at more invariant and monotonic measures of feature selection.  相似文献   

15.
Nowadays, the techniques employed in data acquisition provide huge amounts of data. Some parts of the information are related to the others, making dimensionality reduction desirable, and losing less information as much as possible, in order to decrease computational times and complexity when applying any ensuing data mining technique. Genetic algorithms offer the possibility of selecting which variables contain the most relevant information to represent all the original ones. The traditional genetic operators seem to be too general, leading to results that could be improved by means of designed genetic operators that employ some available problem‐specific information. Especially, when dealing with calibration by means of near‐infrared spectral data, which use to contain thousands of variables, it is known that not isolated wavelengths but wavebands allow a more robust model design. This aspect should be taken into account when crossing individuals. We propose three crossover operators specifically designed for calibration with near‐infrared spectral data, based on a pseudo‐random two‐point crossover, where the first point is chosen randomly, and the selection of the second point is guided by problem‐specific information. We compare their performance with that of state‐of‐the‐art operators. We combine these new genetic algorithm‐based variable selection designs with partial least squares regression and fuzzy systems based calibration. Our benchmark consists of two real‐world high‐dimensional data sets, corresponding to polyetheracrylat, where hydroxyl number, viscosity, and acidity are on‐line monitored; and melamine resin production, where the chilling point (CP) is considered in order to regulate the condensation. We show that designed operators promote wavebands selection, achieve better‐quality solutions, and converge faster and smoother than state‐of‐the‐art operators. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

16.
Clustering of gene expression data collected across time is receiving growing attention in the biological literature since time-course experiments allow one to understand dynamic biological processes and identify genes governed by the same processes. It is believed that genes demonstrating similar expression profiles over time might give an informative insight into how underlying biological mechanisms work. In this paper, we propose a method based on functional data analysis (FNDA) to cluster time-dependent gene expression profiles. Consideration of clustering problems using the FNDA setting provides ways to take time dependency into account by using basis function expansion to describe the partially observed curves. We also discuss how to choose the number of bases in the basis function expansion in FNDA. A synthetic cycle data and a real data are used to demonstrate the proposed method and some comparisons between the proposed and existing approaches using the adjusted Rand indices are made.  相似文献   

17.
Inherent structure (IS) and geometry‐based clustering methods are commonly used for analyzing molecular dynamics trajectories. ISs are obtained by minimizing the sampled conformations into local minima on potential/effective energy surface. The conformations that are minimized into the same energy basin belong to one cluster. We investigate the influence of the applications of these two methods of trajectory decomposition on our understanding of the thermodynamics and kinetics of alanine tetrapeptide. We find that at the microcluster level, the IS approach and root‐mean‐square deviation (RMSD)‐based clustering method give totally different results. Depending on the local features of energy landscape, the conformations with close RMSDs can be minimized into different minima, while the conformations with large RMSDs could be minimized into the same basin. However, the relaxation timescales calculated based on the transition matrices built from the microclusters are similar. The discrepancy at the microcluster level leads to different macroclusters. Although the dynamic models established through both clustering methods are validated approximately Markovian, the IS approach seems to give a meaningful state space discretization at the macrocluster level in terms of conformational features and kinetics. © 2016 Wiley Periodicals, Inc.  相似文献   

18.
This paper describes the first application of fuzzy c-means clustering for the selection of representatives from assemblies of conformations or alignments. In case of alignments, their quality is taken into account using a weighted c-means scheme, developed in this work. The performance of fuzzy cluster validity measures, such as compactness, partition function, and entropy, are studied on several examples, but the visual 3D representation of data points is shown to be most beneficial in determining the optimum number of clusters. Fuzzy clustering is expected to perform better than crisp clustering methods in cases where there are a significant number of "outliers", such as in molecular dynamics simulations and molecular alignments.  相似文献   

19.
Hyperspectral imaging (HSI) is a method for exploring spatial and spectral information associated with the distribution of the different compounds in a chemical or biological sample. Amongst the multivariate image analysis tools utilized to decompose the raw data into a bilinear model, multivariate curve resolution alternating least squares (MCR‐ALS) can be applied to obtain the distribution maps and pure spectra of the components of the sample image. However, a requirement is to have the data in a two‐way matrix. Thus, a preliminary step consists of unfolding the raw HSI data into a single‐pixel direction. Consequently, through this data manipulation, the information regarding pixel neighboring is lost, and spatial information cannot directly be constrained on the component profiles in the current MCR‐ALS algorithm. In this short communication, we propose an adaptation of the MCR‐ALS framework, enabling the potential implementation of any variation of spatial constraint. This can be achieved by adding, at each least‐squares step, refolding/unfolding of the distribution maps for the components. The implementation of segmentation, shape smoothness, and image modeling as spatial constraints is proposed as a proof of concept. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
A new numerical method for solving ordinary differential equations by using High Dimensional Model Representation (HDMR) has been developed in this work. Higher order ordinary differential equations can be reduced to a set of first order ODEs. Although HDMR is generally used for multivariate functions, univariate functions are taken into account throughout the work because of the ODEs’ natures. Not the numerical solution but its image under an appropriately chosen linear ordinary differential operator is expressed as a linear combination of the positive deviation powers of independent variable from its initial value. The linear combination of these image functions are expected to form a basis set under consideration. The unknown constants in the linear combination are found by maximizing the constancy measurer formed in terms of the HDMR components after they are evaluated. Results are compared with well-known step size based numerical methods. A semi qualitative error analysis of the proposed method is also established.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号