首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Outlier detection is an important research direction in the field of data mining. Aiming at the problem of unstable detection results and low efficiency caused by randomly dividing features of the data set in the Isolation Forest algorithm in outlier detection, an algorithm CIIF (Cluster-based Improved Isolation Forest) that combines clustering and Isolation Forest is proposed. CIIF first uses the k-means method to cluster the data set, selects a specific cluster to construct a selection matrix based on the results of the clustering, and implements the selection mechanism of the algorithm through the selection matrix; then builds multiple isolation trees. Finally, the outliers are calculated according to the average search length of each sample in different isolation trees, and the Top-n objects with the highest outlier scores are regarded as outliers. Through comparative experiments with six algorithms in eleven real data sets, the results show that the CIIF algorithm has better performance. Compared to the Isolation Forest algorithm, the average AUC (Area under the Curve of ROC) value of our proposed CIIF algorithm is improved by 7%.  相似文献   

2.
Automatic Noise Recognition was performed in two stages: (1) feature extraction based on the pitch range, found by analyzing the autocorrelation function and (2) classification using a classifier trained on the extracted features. Since most environmental noise types change their acoustical characteristics over time, we focused on the “pitch range” of the sounds in order to extract features. Two different classifiers, Support Vector Machines (SVM) and k-means clustering, were performed and compared using the proposed features. The SVM and k-means clustering classifiers achieve recognition rates up to 95.4% and 92.8%, respectively. Although both classifiers provided high accuracy, the SVM-based classifier outperformed the k-means clustering classifier by approximately 7.4%.  相似文献   

3.
Jian Liu Tiejun Li 《Physica A》2011,390(20):3579-3591
The validity index has been used to evaluate the fitness of partitions produced by clustering algorithms for points in Euclidean space. In this paper, we propose a new validity index for network partitions, which can provide a measure of goodness for the community structure of networks. It is defined as a product of two factors, and involves the compactness and separation for each partition. The simulated annealing strategy is used to minimize such a validity index function in coordination with our previous k-means algorithm based on the optimal reduction of a random walker Markovian dynamics on the network. It is demonstrated that the algorithm can efficiently find the community structure during the cooling process. The number of communities can be automatically determined without any prior knowledge of the community structure. Moreover, the algorithm is successfully applied to three real-world networks.  相似文献   

4.
Text mining was used to extract technical intelligence from the open source global nanotechnology and nanoscience research literature. An extensive nanotechnology/nanoscience-focused query was applied to the Science Citation Index/Social Science Citation Index (SCI/SSCI) databases. The nanotechnology/nanoscience research literature technical structure (taxonomy) was obtained using computational linguistics/document clustering and factor analysis. The infrastructure (prolific authors, key journals/institutions/countries, most cited authors/journals/documents) for each of the clusters generated by the document clustering algorithm was obtained using bibliometrics. Another novel addition was the use of phrase auto-correlation maps to show technical thrust areas based on phrase co-occurrence in Abstracts, and the use of phrase–phrase cross-correlation maps to show technical thrust areas based on phrase relations due to the sharing of common co-occurring phrases. The ∼400 most cited nanotechnology papers since 1991 were grouped, and their characteristics generated. Whereas the main analysis provided technical thrusts of all nanotechnology papers retrieved, analysis of the most cited papers allowed their characteristics to be displayed. Finally, most cited papers from selected time periods were extracted, along with all publications from those time periods, and the institutions and countries were compared based on their representation in the most cited documents list relative to their representation in the most publications list.  相似文献   

5.
Privacy-preserving machine learning has become an important study at present due to privacy policies. However, the efficiency gap between the plain-text algorithm and its privacy-preserving version still exists. In this paper, we focus on designing a novel secret-sharing-based K-means clustering algorithm. Particularly, we present an efficient privacy-preserving K-means clustering algorithm based on replicated secret sharing with honest-majority in the semi-honest model. More concretely, the clustering task is outsourced to three semi-honest computing servers. Theoretically, the proposed privacy-preserving scheme can be proven with full data privacy. Furthermore, the experimental results demonstrate that our proposed privacy version reaches the same accuracy as the plain-text one. Compared to the existing privacy-preserving scheme, our proposed protocol can achieve about 16.5×–25.2× faster computation and 63.8×–68.0× lower communication. Consequently, the proposed privacy-preserving scheme is suitable for secret-sharing-based secure outsourced computation.  相似文献   

6.
《Physica A》1996,231(4):369-374
We simulated lattice animal configurations on the square lattice using the method of Clarke-Vvedensky in their epitaxial growth model. We keep a list of occupied sites according to their number of occupied nearest-neighbor sites. New animal configurations are generated by picking sites from this list with probability proportional to exp(−B/kBT) where B is the number of occupied nearest-neighbors of that site, T is the temperature and kB is Boltzman's constant, and moving the particle at the picked site to a randomly chosen perimeter site of the cluster, without regard to the initial and final configurations of the cluster, except that it must remain connected, and continually updating the list. By comparing with exact enumeration results, we find that this method does not generate the correct animal configurations. We then show that the correct animal configurations can be generated by modifying the method to take into account the initial and final configurations of the clusters. Our results show that the Clarke-Vvedensky model of epitaxial growth does not satisfy detailed balance in the equilibrium limit.  相似文献   

7.
Xu J  Zhang Q  Shih CK 《Molecular diversity》2006,10(3):463-478
Summary Clustering molecules based on numeric data such as, gene-expression data, physiochemical properties, or theoretical data is very important in drug discovery and other life sciences. Most approaches use hierarchical clustering algorithms, non-hierarchical algorithms (for examples, K-mean and K-nearest neighbor), and other similar methods (for examples, the Self-Organization Mapping (SOM) and the Support Vector Machine (SVM)). These approaches are non-robust (results are not consistent) and, computationally expensive. This paper will report a new, non-hierarchical algorithm called the V-Cluster (V stands for vector) Algorithm. This algorithm produces rational, robust results while reducing computing complexity. Similarity measurement and data normalization rules are also discussed along with case studies. When molecules are represented in a set of numeric vectors, the V-Cluster Algorithm clusters the molecules in three steps: (1) ranking the vectors based upon their overall intensity levels, (2) computing cluster centers based upon neighboring density, and (3) assigning molecules to their nearest cluster center. The program is written in C/C++ language, and runs on Window95/NT and UNIX platforms. With the V-Cluster program, the user can quickly complete the clustering process and, easily examine the results by use of thumbnail graphs, superimposed intensity curves of vectors, and spreadsheets. Multi-functional query tools have also been implemented.  相似文献   

8.
A scheme to generate multi-atom one-dimensional cluster state via one microwave cavity with an additional driven classical field is proposed. According to the scheme, one-dimensional cluster state with 2k-atom can be prepared in one step via one cavity, one-dimensional cluster state with (2k-1)-atom can be generated by measuring the2kth-atom of an 2k-atom cluster state in a certain basis. This scheme avoids cavity-field induced decay and may achieve one-dimensional cluster states with ideal success probability.  相似文献   

9.
《Physica A》2006,363(2):226-236
Several studies have investigated the scaling behavior in naturally occurring biological and physical processes using techniques such as detrended fluctuation analysis (DFA). Data acquisition is an inherent part of these studies and maps the continuous process into digital data. The resulting digital data is discretized in amplitude and time, and shall be referred to as coarse-grained realization in the present study. Since coarse-graining precedes scaling exponent analysis, it is important to understand its effects on scaling exponent estimators such as DFA. In this brief communication, k-means clustering is used to generate coarse-grained realizations of data sets with different correlation properties, namely: anti-correlated noise, long-range correlated noise and uncorrelated noise. It is shown that the coarse-graining can significantly affect the scaling exponent estimates. It is also shown that scaling exponent can be reliably estimated even at low levels of coarse-graining and the number of the clusters required varies across the data sets with different correlation properties.  相似文献   

10.
蔡俊伟  胡寿松  陶洪峰 《物理学报》2007,56(12):6820-6827
提出了一种基于聚类的选择性支持向量机集成预测模型.为提高支持向量机集成的泛化能力,采用自组织映射和K均值聚类算法结合的聚类组合算法,从每簇中选择出精度最高的子支持向量机进行集成,可以保证子支持向量机有较高精度并提高了子支持向量机之间的差异度.该方法能以较小的代价显著提高支持向量机集成的泛化能力.采用该方法对Mackey-Glass混沌时间序列和Lorenz系统生成的混沌时间序列进行预测实验,结果表明可以对混沌时间序列进行准确预测,验证了该方法的有效性. 关键词: 支持向量机 集成 混沌时间序列 聚类  相似文献   

11.
高忠科  金宁德 《物理学报》2008,57(11):6909-6920
利用气液两相流电导波动信号构建了流型复杂网络. 基于K均值聚类的社团探寻算法对该网络的社团结构进行了分析,发现该网络存在分别对应于泡状流、段塞流及混状流的三个社团,并且两个社团间联系紧密的点分别对应于相应的过渡流型. 基于复杂网络理论从全新的角度探讨了两相流流型复杂网络社团结构及统计特性问题,并取得了满意的流型识别效果,与此同时,在对该网络特性进一步分析的基础上,发现了对两相流流动参数变化敏感的相关复杂网络统计量,为更好地理解两相流流型动力学特性提供了参考. 关键词: 两相流流型 复杂网络 社团探寻算法 网络统计特性  相似文献   

12.
We provide two sufficient and necessary conditions to characterize any n-bit partial Boolean function with exact quantum query complexity 1. Using the first characterization, we present all n-bit partial Boolean functions that depend on n bits and can be computed exactly by a 1-query quantum algorithm. Due to the second characterization, we construct a function F that maps any n-bit partial Boolean function to some integer, and if an n-bit partial Boolean function f depends on k bits and can be computed exactly by a 1-query quantum algorithm, then F(f) is non-positive. In addition, we show that the number of all n-bit partial Boolean functions that depend on k bits and can be computed exactly by a 1-query quantum algorithm is not bigger than an upper bound depending on n and k. Most importantly, the upper bound is far less than the number of all n-bit partial Boolean functions for all efficiently big n.  相似文献   

13.
We designed a semiautomatic segmentation method to easily measure the volume of a bone cyst (simple or aneurysmal) from magnetic resonance imaging (MRI). This method only considers the fluid part of the cyst, even when there are several fluid intensities (fluid-fluid levels) or the cyst is multi-loculated. The nonhomogeneity phenomenon inherent in MRI was handled by a k-means clustering algorithm that classified all of the voxels corresponding to the cyst fluid as the same voxel intensity. Level-set segmentation was expanded into the whole cyst volume and the resulting segmented volume provided the measured cyst volume. The semiautomatic method was compared with the usual manual method (manual contour tracing) in terms of its ability to measure a known volume of water (gold standard) as well as the volume of 29 bone cysts. Both methods were equivalent with regards to the gold standard, but the semiautomatic method was more accurate. In terms of the experimental measurements, the semiautomatic method was more repeatable and reproducible, and less time-consuming and fastidious than the manual method. Our semiautomatic method uses only freeware and can be used routinely whenever measurement of a bone cyst volume is needed.  相似文献   

14.
We introduce a novel noniterative algorithm for the fast and accurate reconstruction of nonuniformly sampled MRI data. The proposed scheme derives the reconstructed image as the nonuniform inverse Fourier transform of a compensated dataset. We derive each sample in the compensated dataset as a weighted linear combination of a few measured k-space samples. The specific k-space samples and the weights involved in the linear combination are derived such that the reconstruction error is minimized. The computational complexity of the proposed scheme is comparable to that of gridding. At the same time, it provides significantly improved accuracy and is considerably more robust to noise and undersampling. The advantages of the proposed scheme makes it ideally suited for the fast reconstruction of large multidimensional datasets, which routinely arise in applications such as f-MRI and MR spectroscopy. The comparisons with state-of-the-art algorithms on numerical phantoms and MRI data clearly demonstrate the performance improvement.  相似文献   

15.
This paper proposes a robust method to detect and extract silhouettes of foreground objects from a video sequence of a static camera based on the improved background subtraction technique. The proposed method analyses statistically the pixel history as time series observations. The proposed method presents a robust technique to detect motions based on kernel density estimation. Two consecutive stages of the k-means clustering algorithm are utilized to identify the most reliable background regions and decrease the detection of false positives. Pixel and object based updating mechanism for the background model is presented to cope with challenges like gradual and sudden illumination changes, ghost appearance, non-stationary background objects, and moving objects that remain stable for more than the half of the training period. Experimental results show the efficiency and the robustness of the proposed method to detect and extract the silhouettes of moving objects in outdoor and indoor environments compared with conventional methods.  相似文献   

16.
Mu Chen  Peng Xu  Jun Chen 《Physica A》2007,385(2):707-717
We introduce a new simple pseudo tree-like network model, deterministic complex network (DCN). The proposed DCN model may simulate the hierarchical structure nature of real networks appropriately and have the unique property of ‘skipping the levels’, which is ubiquitous in social networks. Our results indicate that the DCN model has a rather small average path length and large clustering coefficient, leading to the small-world effect. Strikingly, our DCN model obeys a discrete power-law degree distribution P(k)∝kγ, with exponent γ approaching 1.0. We also discover that the relationship between the clustering coefficient and degree follows the scaling law C(k)∼k−1, which quantitatively determines the DCN's hierarchical structure.  相似文献   

17.
We study the dynamical heterogeneities along the attractive glass line in a model for charged colloids, where short-range attraction competes with long-range repulsion. We focus on the crossover from gel-like to glassy-like regime, where the system displays both features of chemical gels and of glassy systems. The former are due to long living clusters the latter to crowding effects, which both slow down the relaxation process. We show how to separate the two effects by looking at the fluctuation of the self Intermediate Scattering Function, ?? 4(k,t), for different values of wave vector k. For small values of k, ?? 4(k,t) detects the dynamical behavior at large length scale, where both cluster effects and crowding effects are present. On the other hand by increasing the value of k, ?? 4(k,t) detects the dynamical behavior at small length scale where the cluster effects are suppressed and only the crowding effects dominate the dynamics.  相似文献   

18.
The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem’s features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase in the dimensionality of the dataset. This effect introduces unexpected challenges to deal with the overabundance of variables and/or noisy data. In this regard, in this paper we propose a novel encoding approach that maps mixed-type data into an information space using Shannon’s Theory to model the amount of information contained in the original data. We evaluated our proposal with ten mixed-type datasets from the UCI repository and two datasets representing real-world problems obtaining promising results. For demonstrating the performance of our proposal, this was applied for preparing these datasets for classification, regression, and clustering tasks. We demonstrate that our encoding proposal is remarkably superior to one-hot and feature-hashing encoding in terms of memory efficiency. Our proposal can preserve the information conveyed by the original data.  相似文献   

19.
We retrieve the radius R, real n and imaginary k parts of the refractive index of homogeneous spherical particles using angular distribution of the light-scattering intensity. To solve the inverse light-scattering problem we use a high-order neural-network technique. The effect of network parameters on optimization is examined. The technique is evaluated for noise-corrupted input data at 0.6 μm<R<10.6 μm, 1.02<n<1.38, and 0<k<0.03. The errors of retrieval for nonabsorbing particles do not exceed 0.05 μm for radius and 0.015 for refractive index. The experimental verification is fulfilled by experimental data retrieved by means of a scanning flow cytometer. The light-scattering profiles of polystyrene beads and spherized red blood cells are processed with the high-order neural networks and a non-linear regression at Mie theory. The parameters retrieved by the high-order neural networks correlate well with the parameters retrieved by the least-square method.  相似文献   

20.
Random walk simulations based on a molecular trajectory algorithm are performed on critical percolation clusters. The analysis of corrections to scaling is carried out. It has been found that the fractal dimension of the random walk on the incipient infinite cluster is dw=2.873±0.008 in two dimensions and 3.78 ± 0.02 in three dimensions. If instead the diffusion is averaged over all clusters at the threshold not subject to the infinite restriction, the corresponding critical exponent k is found to be k=0.3307±0.0014 for two-dimensional space and 0.199 ± 0.002 for three-dimensional space. Moreover, in our simulations the asymptotic behaviors of local critical exponents are reached much earlier than in other numerical methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号