首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.  相似文献   

2.
The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data available from numerous sources. Nonnegative matrix factorization (NMF) has proven to be a successful method for cluster and topic discovery in unlabeled data sets. In this paper, we propose a fast algorithm for computing NMF using a divide-and-conquer strategy, called DC-NMF. Given an input matrix where the columns represent data items, we build a binary tree structure of the data items using a recently-proposed efficient algorithm for computing rank-2 NMF, and then gather information from the tree to initialize the rank-k NMF, which needs only a few iterations to reach a desired solution. We also investigate various criteria for selecting the node to split when growing the tree. We demonstrate the scalability of our algorithm for computing general rank-k NMF as well as its effectiveness in clustering and topic modeling for large-scale text data sets, by comparing it to other frequently utilized state-of-the-art algorithms. The value of the proposed approach lies in the highly efficient and accurate method for initializing rank-k NMF and the scalability achieved from the divide-and-conquer approach of the algorithm and properties of rank-2 NMF. In summary, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains along with an open-source software library called SmallK.  相似文献   

3.
4.
In this paper we consider n-poised planar node sets, as well as more special ones, called G C n sets. For the latter sets each n-fundamental polynomial is a product of n linear factors as it always holds in the univariate case. A line ? is called k-node line for a node set \(\mathcal X\) if it passes through exactly k nodes. An (n + 1)-node line is called maximal line. In 1982 M. Gasca and J. I. Maeztu conjectured that every G C n set possesses necessarily a maximal line. Till now the conjecture is confirmed to be true for n ≤ 5. It is well-known that any maximal line M of \(\mathcal X\) is used by each node in \(\mathcal X\setminus M, \)meaning that it is a factor of the fundamental polynomial. In this paper we prove, in particular, that if the Gasca-Maeztu conjecture is true then any n-node line of G C n set \(\mathcal {X}\) is used either by exactly \(\binom {n}{2}\) nodes or by exactly \(\binom {n-1}{2}\) nodes. We prove also similar statements concerning n-node or (n ? 1)-node lines in more general n-poised sets. This is a new phenomenon in n-poised and G C n sets. At the end we present a conjecture concerning any k-node line.  相似文献   

5.
In this paper, the parametric matrix equation A(p)X = B(p) whose elements are linear functions of uncertain parameters varying within intervals are considered. In this matrix equation A(p) and B(p) are known m-by-m and m-by-n matrices respectively, and X is the m-by-n unknown matrix. We discuss the so-called AE-solution sets for such systems and give some analytical characterizations for the AE-solution sets and a sufficient condition under which these solution sets are bounded. We then propose a modification of Krawczyk operator for parametric systems which causes reduction of the computational complexity of obtaining an outer estimation for the parametric united solution set, considerably. Then we give a generalization of the Bauer-Skeel and the Hansen-Bliek-Rohn bounds for enclosing the parametric united solution set which also enables us to reduce the computational complexity, significantly. Also some numerical approaches based on Gaussian elimination and Gauss-Seidel methods to find outer estimations for the parametric united solution set are given. Finally, some numerical experiments are given to illustrate the performance of the proposed methods.  相似文献   

6.
For a linear extension P of a partially ordered set S, we consider a generating multivariate polynomial of certain reverse partitions on S, called P-pedestals. We establish a remarkable property of this polynomial: it does not depend on the choice of P. For S a Young diagram, we show that this polynomial generalizes the hook polynomial.  相似文献   

7.
Sufficient dimension reduction methodologies in regressions of Y on a p-variate X aim at obtaining a reduction \(R(X) \in {\mathbb R}^{d}, d \le p\), that retains all the regression information of Y in X. When the predictors fall naturally into a number of known groups or domains, it has been established that exploiting the grouping information often leads to more effective sufficient dimension reduction of the predictors. In this article, we consider group-wise sufficient dimension reduction based on principal fitted components, when the grouping information is unknown. Principal fitted components methodology is coupled with an agglomerative clustering procedure to identify a suitable grouping structure. Simulations and real data analysis demonstrate that the group-wise principal fitted components sufficient dimension reduction is superior to the standard principal fitted components and to general sufficient dimension reduction methods.  相似文献   

8.
The local reconstruction from samples is one of most desirable properties for many applications in signal processing, but it has not been given as much attention. In this paper, we will consider the local reconstruction problem for signals in a shift-invariant space. In particular, we consider finding sampling sets X such that signals in a shift-invariant space can be locally reconstructed from their samples on X. For a locally finite-dimensional shift-invariant space V we show that signals in V can be locally reconstructed from its samples on any sampling set with sufficiently large density. For a shift-invariant space V(? 1, ..., ? N ) generated by finitely many compactly supported functions ? 1, ..., ? N , we characterize all periodic nonuniform sampling sets X such that signals in that shift-invariant space V(? 1, ..., ? N ) can be locally reconstructed from the samples taken from X. For a refinable shift-invariant space V(?) generated by a compactly supported refinable function ?, we prove that for almost all \((x_0, x_1)\in [0,1]^2\), any signal in V(?) can be locally reconstructed from its samples from \(\{x_0, x_1\}+{\mathbb Z}\) with oversampling rate 2. The proofs of our results on the local sampling and reconstruction in the refinable shift-invariant space V(?) depend heavily on the linear independent shifts of a refinable function on measurable sets with positive Lebesgue measure and the almost ripplet property for a refinable function, which are new and interesting by themselves.  相似文献   

9.
10.
In the present paper we estimate variation in the relative Chebyshev radius R W (M), where M and W are nonempty bounded sets of a metric space, as the sets M and W change. We find the closure and the interior of the set of all N-nets each of which contains its unique relative Chebyshev center, in the set of all N-nets of a special geodesic space endowed by the Hausdorff metric. We consider various properties of relative Chebyshev centers of a finite set which lie in this set.  相似文献   

11.
Graph coloring is an important tool in the study of optimization,computer science,network design,e.g.,file transferring in a computer network,pattern matching,computation of Hessians matrix and so on.In this paper,we consider one important coloring,vertex coloring of a total graph,which is also called total coloring.We consider a planar graph G with maximum degree Δ(G)≥8,and proved that if G contains no adjacent i,j-cycles with two chords for some i,j∈{5,6,7},then G is total-(Δ+1)-colorable.  相似文献   

12.
We construct a class of special homogeneous Moran sets, called {mk}-quasi homogeneous Cantor sets, and discuss their Hausdorff dimensions. By adjusting the value of {mk}k?1, we constructively prove the intermediate value theorem for the homogeneous Moran set. Moreover, we obtain a sufficient condition for the Hausdorff dimension of ho- mogeneous Moran sets to assume the minimum value, which expands earlier works.  相似文献   

13.
We introduce the notion of A-numbering which generalizes the classical notion of numbering. All main attributes of classical numberings are carried over to the objects considered here. The problem is investigated of the existence of positive and decidable computable A-numberings for the natural families of sets e-reducible to a fixed set. We prove that, for every computable A-family containing an inclusion-greatest set, there also exists a positive computable A-numbering. Furthermore, for certain families we construct a decidable (and even single-valued) computable total A-numbering when A is a low set; we also consider a relativization containing all cases of total sets (this in fact corresponds to computability with a usual oracle).  相似文献   

14.
In several real life and research situations data are collected in the form of intervals, the so called interval-valued data. In this paper a fuzzy clustering method to analyse interval-valued data is presented. In particular, we address the problem of interval-valued data corrupted by outliers and noise. In order to cope with the presence of outliers we propose to employ a robust metric based on the exponential distance in the framework of the Fuzzy C-medoids clustering mode, the Fuzzy C-medoids clustering model for interval-valued data with exponential distance. The exponential distance assigns small weights to outliers and larger weights to those points that are more compact in the data set, thus neutralizing the effect of the presence of anomalous interval-valued data. Simulation results pertaining to the behaviour of the proposed approach as well as two empirical applications are provided in order to illustrate the practical usefulness of the proposed method.  相似文献   

15.
We study the relationship between the size of two sets B, S ? R2, when B contains either the whole boundary or the four vertices of a square with axes-parallel sides and center in every point of S. Size refers to cardinality, Hausdorff dimension, packing dimension, or upper or lower box dimension. Perhaps surprisingly, the results vary depending on the notion of size under consideration. For example, we construct a compact set B of Hausdorff dimension 1 which contains the boundary of an axes-parallel square with center in every point in [0, 1]2, prove that such a B must have packing and lower box dimension at least 7/4, and show by example that this is sharp. For more general sets of centers, the answers for packing and box counting dimensions also differ. These problems are inspired by the analogous problems for circles that were investigated by Bourgain, Marstrand and Wolff, among others.  相似文献   

16.
We study the cardinalities of A/A and AA for thin subsets A of the set of the first n positive integers. In particular, we consider the typical size of these quantities for random sets A of zero density and compare them with the sizes of A/A and AA for subsets of the shifted primes and the set of sums of two integral squares.  相似文献   

17.
An r-coloring of a subset A of a finite abelian group G is called sum-free if it does not induce a monochromatic Schur triple, i.e., a triple of elements a, b, cA with a + b = c. We investigate κr,G, the maximum number of sum-free r-colorings admitted by subsets of G, and our results show a close relationship between κr,G and largest sum-free sets of G.Given a sufficiently large abelian group G of type I, i.e., |G| has a prime divisor q with q ≡ 2 (mod 3). For r = 2, 3 we show that a subset A ? G achieves κr,G if and only if A is a largest sum-free set of G. For even order G the result extends to r = 4, 5, where the phenomenon persists only if G has a unique largest sum-free set. On the contrary, if the largest sum-free set in G is not unique, then A attains κr,G if and only if it is the union of two largest sum-free sets (in case r = 4) and the union of three (“independent”) largest sum-free sets (in case r = 5).Our approach relies on the so called container method and can be extended to larger r in case G is of even order and contains sufficiently many largest sum-free sets.  相似文献   

18.
The kernel-based regression (KBR) method, such as support vector machine for regression (SVR) is a well-established methodology for estimating the nonlinear functional relationship between the response variable and predictor variables. KBR methods can be very sensitive to influential observations that in turn have a noticeable impact on the model coefficients. The robustness of KBR methods has recently been the subject of wide-scale investigations with the aim of obtaining a regression estimator insensitive to outlying observations. However, existing robust KBR (RKBR) methods only consider Y-space outliers and, consequently, are sensitive to X-space outliers. As a result, even a single anomalous outlying observation in X-space may greatly affect the estimator. In order to resolve this issue, we propose a new RKBR method that gives reliable result even if a training data set is contaminated with both Y-space and X-space outliers. The proposed method utilizes a weighting scheme based on the hat matrix that resembles the generalized M-estimator (GM-estimator) of conventional robust linear analysis. The diagonal elements of hat matrix in kernel-induced feature space are used as leverage measures to downweight the effects of potential X-space outliers. We show that the kernelized hat diagonal elements can be obtained via eigen decomposition of the kernel matrix. The regularized version of kernelized hat diagonal elements is also proposed to deal with the case of the kernel matrix having full rank where the kernelized hat diagonal elements are not suitable for leverage. We have shown that two kernelized leverage measures, namely, the kernel hat diagonal element and the regularized one, are related to statistical distance measures in the feature space. We also develop an efficiently kernelized training algorithm for the parameter estimation based on iteratively reweighted least squares (IRLS) method. The experimental results from simulated examples and real data sets demonstrate the robustness of our proposed method compared with conventional approaches.  相似文献   

19.
Rauni Lillemets 《Positivity》2017,21(3):1049-1066
The paper is based on the following construction due to Stephani from 1980. Given a collection of sequences in a Banach space X, we consider all the subsets in which every sequence has a subsequence from the given collection. This construction produces a collection of subsets of X. Conversely, given a collection of subsets of X, we consider all the sequences contained in those sets; this procedure gives us a collection of sequences in X. Thus, we have maps from collections of sequences to collections of subsets, and vice versa. We study these maps and various structures that arise as byproducts of these maps. We also investigate order properties of these maps.  相似文献   

20.
Given a metric measure space X, we consider a scale of function spaces \(T^{p,q}_s(X)\), called the weighted tent space scale. This is an extension of the tent space scale of Coifman, Meyer, and Stein. Under various geometric assumptions on X we identify some associated interpolation spaces, in particular certain real interpolation spaces. These are identified with a new scale of function spaces, which we call Z -spaces, that have recently appeared in the work of Barton and Mayboroda on elliptic boundary value problems with boundary data in Besov spaces. We also prove Hardy–Littlewood–Sobolev-type embeddings between weighted tent spaces.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号