首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 765 毫秒
1.
In this paper we present a simple dynamization method that preserves the query and storage costs of a static data structure and ensures reasonable update costs. In this method, the majority of data elements are maintained in a single data structure, and the updates are handled using smaller auxiliary data structures. We analyze the query, storage, and amortized update costs for the dynamic version of a static data structure in terms of a functionf, such thatf(n)<n, that bounds the sizes of the auxiliary data structures (wheren is the number of elements in the data structure). The conditions onf for minimal (with respect to asymptotic upper bounds) amortized update costs are then obtained. The proposed method is shown to be particularly suited for the cases where the merging of two data structures is more efficient than building the resultant data structure from scratch. Its effectiveness is illustrated by applying it to a class of data structures that have linear merging cost; this class consists of data structures such as Voronoi diagrams, K-d trees, quadtrees, multiple attribute trees, etc.  相似文献   

2.
The numerical differentiation of data divides naturally into two distinct problems:
  1. the differentiation of exact data, and
  2. the differentiation of non-exact (experimental) data.
In this paper, we examine the latter. Because methods developed for exact data are based on abstract formalisms which are independent of the structure within the data, they prove, except for the regularization procedure of Cullum, to be unsatisfactory for non-exact data. We therefore adopt the point of view that satisfactory methods for non-exact data must take the structure within the data into account in some natural way, and use the concepts of regression and spectrum analysis as a basis for the development of such methods. The regression procedure is used when either the structure within the non-exact data is known on independent grounds, or the assumptions which underlie the spectrum analysis procedure [viz., stationarity of the (detrended) data] do not apply. In this latter case, the data could be modelled using splines. The spectrum analysis procedure is used when the structure within the nonexact data (or a suitable transformation of it, where the transformation can be differentiated exactly) behaves as if it were generated by a stationary stochastic process. By proving that the regularization procedure of Cullum is equivalent to a certain spectrum analysis procedure, we derive a fast Fourier transform implementation for regularization (based on this equivalence) in which an acceptable value of the regularization parameter is estimated directly from a time series formulation based on this equivalence. Compared with the regularization procedure, which involvesO(n 3) operations (wheren is the number of data points), the fast Fourier transform implementation only involvesO(n logn).  相似文献   

3.
1.IntroductionInthispaperweconsiderCauchyproblemforaclassofnonhomogeneousNavier-Stokesequationsintheinfinitecylinderwith.Givensatisfyinginthedistributionsensediv,weseekasolutionvectorandapressurefunctionP(t,x)suchthatwhereisanonlinearvector-valuedfun...  相似文献   

4.
This paper proposes a novel ant colony optimisation (ACO) algorithm tailored for the hierarchical multi-label classification problem of protein function prediction. This problem is a very active research field, given the large increase in the number of uncharacterised proteins available for analysis and the importance of determining their functions in order to improve the current biological knowledge. Since it is known that a protein can perform more than one function and many protein functional-definition schemes are organised in a hierarchical structure, the classification problem in this case is an instance of a hierarchical multi-label problem. In this type of problem, each example may belong to multiple class labels and class labels are organised in a hierarchical structure—either a tree or a directed acyclic graph structure. It presents a more complex problem than conventional flat classification, given that the classification algorithm has to take into account hierarchical relationships between class labels and be able to predict multiple class labels for the same example. The proposed ACO algorithm discovers an ordered list of hierarchical multi-label classification rules. It is evaluated on sixteen challenging bioinformatics data sets involving hundreds or thousands of class labels to be predicted and compared against state-of-the-art decision tree induction algorithms for hierarchical multi-label classification.  相似文献   

5.
Nonlinear dimensionality reduction (NLDR) algorithms such as Isomap, LLE, and Laplacian Eigenmaps address the problem of representing high-dimensional nonlinear data in terms of low-dimensional coordinates which represent the intrinsic structure of the data. This paradigm incorporates the assumption that real-valued coordinates provide a rich enough class of functions to represent the data faithfully and efficiently. On the other hand, there are simple structures which challenge this assumption: the circle, for example, is one-dimensional, but its faithful representation requires two real coordinates. In this work, we present a strategy for constructing circle-valued functions on a statistical data set. We develop a machinery of persistent cohomology to identify candidates for significant circle-structures in the data, and we use harmonic smoothing and integration to obtain the circle-valued coordinate functions themselves. We suggest that this enriched class of coordinate functions permits a precise NLDR analysis of a broader range of realistic data sets.  相似文献   

6.
Multivariate data modelling problems consist of a number of nodes with associated function (class) values. The main purpose of these problems is to construct an analytical model to represent the characteristics of the problem under consideration. Because the devices, tools, and/or algorithms used to collect the data may have incapabilities or limited capabilities, the data set is likely to contain unavoidable errors. That is, each component of data is reliable only within an interval which contains the data value. To this end, when an analytical structure is needed for the given data, a band structure should be determined instead of a unique structure. As the multivariance of the given data set increases, divide–and–conquer methods become important in multivariate modelling problems. HDMR based methods allow us to partition the given multivariate data into less variate data sets to reduce the complexity of the given problem. This paper focuses on Interval Factorized HDMR method developed to determine an approximate band structure for a given multivariate data modelling problem having uncertainties on its nodes and function values.  相似文献   

7.
Exploring incomplete data using visualization techniques   总被引:1,自引:0,他引:1  
Visualization of incomplete data allows to simultaneously explore the data and the structure of missing values. This is helpful for learning about the distribution of the incomplete information in the data, and to identify possible structures of the missing values and their relation to the available information. The main goal of this contribution is to stress the importance of exploring missing values using visualization methods and to present a collection of such visualization techniques for incomplete data, all of which are implemented in the ${{\sf R}}$ package VIM. Providing such functionality for this widely used statistical environment, visualization of missing values, imputation and data analysis can all be done from within ${{\sf R}}$ without the need of additional software.  相似文献   

8.
We consider a class of symmetric tridiagonal matrices which may be viewed as perturbations of Toeplitz matrices. The Toeplitz structure is destroyed since two elements on each off-diagonal are perturbed. Based on a careful analysis, we derive sharp bounds for the extremal eigenvalues of this class of matrices in terms of the original data of the given matrix. In this way, we also obtain a lower bound for the smallest singular value of certain matrices. Some numerical results indicate that our bounds are extremely good.  相似文献   

9.
In practical data mining tasks, high-dimensional data has to be analyzed. In most of the cases it is very informative to map and visualize the hidden structure of a complex data set in a low-dimensional space. In this paper a new class of mapping algorithms is defined. These algorithms combine topology representing networks and different nonlinear mapping algorithms. While the former methods aim to quantify the data and disclose the real structure of the objects, the nonlinear mapping algorithms are able to visualize the quantized data in the low-dimensional vector space. In this paper, techniques based on these methods are gathered and the results of a detailed analysis performed on them are shown. The primary aim of this analysis is to examine the preservation of distances and neighborhood relations of the objects. Preservation of neighborhood relations was analyzed both in local and global environments. To evaluate the main properties of the examined methods we show the outcome of the analysis based both on synthetic and real benchmark examples.  相似文献   

10.
This paper establishes a general ABC inventory classification system as the foundation for a normative model of the maintenance cost structure and stock turnover characteristics of a large, multi-item inventory system with constant demand. For any specified number of inventory classes, the model allows expression of the overall system combined ordering and holding cost in terms of (i) the re-ordering frequencies for the items in each inventory class and (ii) the inventory class structure, that is, the proportion of the total system's items that are in each inventory class. The model yields a minimum total maintenance cost function, which reflects the effect of class structure on inventory maintenance costs and turnover. If the Pareto curve (a.k.a. Distribution-by-value function) for the inventory system can be expressed (or approximated) analytically, the model can also be used to determine an optimal class structure, as well as an appropriate number of inventory classes. A special case of the model produces a simply structured, class-based ordering policy for minimizing total inventory maintenance costs. Using real data, the cost characteristics of this policy are compared to those of a heuristic, commonly used by managers of multi-item inventory systems. This cost comparison, expressed graphically, underscores the need for normative modelling approaches to the problem of inventory cost management in large, multi-item systems.  相似文献   

11.
Multi-dimensional classification aims at finding a function that assigns a vector of class values to a given vector of features. In this paper, this problem is tackled by a general family of models, called multi-dimensional Bayesian network classifiers (MBCs). This probabilistic graphical model organizes class and feature variables as three different subgraphs: class subgraph, feature subgraph, and bridge (from class to features) subgraph. Under the standard 0-1 loss function, the most probable explanation (MPE) must be computed, for which we provide theoretical results in both general MBCs and in MBCs decomposable into maximal connected components. Moreover, when computing the MPE, the vector of class values is covered by following a special ordering (gray code). Under other loss functions defined in accordance with a decomposable structure, we derive theoretical results on how to minimize the expected loss. Besides these inference issues, the paper presents flexible algorithms for learning MBC structures from data based on filter, wrapper and hybrid approaches. The cardinality of the search space is also given. New performance evaluation metrics adapted from the single-class setting are introduced. Experimental results with three benchmark data sets are encouraging, and they outperform state-of-the-art algorithms for multi-label classification.  相似文献   

12.
Ariyawansa and Zhu have proposed a new class of optimization problems termed stochastic semidefinite programs to handle data uncertainty in applications leading to (deterministic) semidefinite programs. For stochastic semidefinite programs with finite event space, they have also derived a class of volumetric barrier decomposition algorithms, and proved polynomial complexity of certain members of the class. In this paper, we consider homogeneous self-dual algorithms for stochastic semidefinite programs with finite event space. We show how the structure in such problems may be exploited so that the algorithms developed in this paper have complexity similar to those of the decomposition algorithms mentioned above.  相似文献   

13.
In this article, we study analytic continuation and global structure of singularities of the solution of the Cauchy problem with entire functions data or singular data for certain differential operators with principal part of polynomial coefficients in the complex domain. This study gives a complete proof of the results of a preceding article [J. Fixed Point Theory Appl. 1 (2007), 209–220].  相似文献   

14.
Bayesian networks (BNs) are widely used graphical models usable to draw statistical inference about directed acyclic graphs. We presented here Graph_sampler a fast free C language software for structural inference on BNs. Graph_sampler uses a fully Bayesian approach in which the marginal likelihood of the data and prior information about the network structure are considered. This new software can handle both the continuous as well as discrete data and based on the data type two different models are formulated. The software also provides a wide variety of structure prior which can depict either the global or local properties of the graph structure. Now based on the type of structure prior selected, we considered a wide range of possible values for the prior making it either informative or uninformative. We proposed a new and much faster jumping kernel strategy in the Metropolis–Hastings algorithm. The source C code distributed is very compact, fast, uses low memory and disk storage. We performed out several analyses based on different simulated data sets and synthetic as well as real networks to discuss the performance of Graph_sampler.  相似文献   

15.
A sub–class of phase–type distributions is defined in terms of a Markov process with sequential transitions between transient states and transitions from these states to absorption. Such distributions form a very rich class; they can be fitted to data, and any structure revealed by the parameter estimates used to develop more parsimonious re–parametrizations. Several example data sets are used as illustrations.  相似文献   

16.
非参数核回归方法近年来已被用于纵向数据的分析(Lin和Carroll,2000).一个颇具争议性的问题是在非参数核回归中是否需要考虑纵向数据间的相关性.Lin和Carroll (2000)证明了基于独立性(即忽略相关性)的核估计在一类核GEE估计量中是(渐近)最有效的.基于混合效应模型方法作者提出了一个不同的核估计类,它自然而有效地结合了纵向数据的相关结构.估计量达到了与Lin和Carroll的估计量相同的渐近有效性,且在有限样本情形下表现更好.由此方法可以很容易地获得对于总体和个体的非参数曲线估计.所提出的估计量具有较好的统计性质,且实施方便,从而对实际工作者具有较大的吸引力.  相似文献   

17.
Albanese et al in 2003 and Avellaneda and Zhu in 2001 develop the framework of credit barrier model. They provide special solutions to the model in case of simple stochastic structure. The technical aspect of the model remains wide open for general stochastic structure that is crucial when the model is required to calibrate with aggregate amount of empirical data. This paper provides a technical solution to this problem with the use of radial basis functions (RBF). This paper discusses the numerical implementation of the credit barrier model using the RBF method. It also demonstrates that the RBF method is numerically tractable in this problem and allows in the model richer stochastic structure capable of capturing relevant market information.  相似文献   

18.
Many existing statistical and machine learning tools for social network analysis focus on a single level of analysis. Methods designed for clustering optimize a global partition of the graph, whereas projection-based approaches (e.g., the latent space model in the statistics literature) represent in rich detail the roles of individuals. Many pertinent questions in sociology and economics, however, span multiple scales of analysis. Further, many questions involve comparisons across disconnected graphs that will, inevitably be of different sizes, either due to missing data or the inherent heterogeneity in real-world networks. We propose a class of network models that represent network structure on multiple scales and facilitate comparison across graphs with different numbers of individuals. These models differentially invest modeling effort within subgraphs of high density, often termed communities, while maintaining a parsimonious structure between said subgraphs. We show that our model class is projective, highlighting an ongoing discussion in the social network modeling literature on the dependence of inference paradigms on the size of the observed graph. We illustrate the utility of our method using data on household relations from Karnataka, India. Supplementary material for this article is available online.  相似文献   

19.
Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the solution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or occurring twice in the same string. In this paper, we present an algorithm called Nimbus for filtering strings prior to finding repetitions occurring twice or more in a string, or in two or more strings. Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with Nimbus a data set where one wants to find functional elements using a multiple local alignment tool such as Glam, the overall execution time can be reduced from 7.5 hours to 2 minutes.  相似文献   

20.
The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data available from numerous sources. Nonnegative matrix factorization (NMF) has proven to be a successful method for cluster and topic discovery in unlabeled data sets. In this paper, we propose a fast algorithm for computing NMF using a divide-and-conquer strategy, called DC-NMF. Given an input matrix where the columns represent data items, we build a binary tree structure of the data items using a recently-proposed efficient algorithm for computing rank-2 NMF, and then gather information from the tree to initialize the rank-k NMF, which needs only a few iterations to reach a desired solution. We also investigate various criteria for selecting the node to split when growing the tree. We demonstrate the scalability of our algorithm for computing general rank-k NMF as well as its effectiveness in clustering and topic modeling for large-scale text data sets, by comparing it to other frequently utilized state-of-the-art algorithms. The value of the proposed approach lies in the highly efficient and accurate method for initializing rank-k NMF and the scalability achieved from the divide-and-conquer approach of the algorithm and properties of rank-2 NMF. In summary, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains along with an open-source software library called SmallK.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号