首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operation which hinders the scalability of the approach. This work studies a different parallel formulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature of the centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.  相似文献   

2.
Abstract

Statistical environments such as S, R, XLisp-Stat, and others have had a dramatic effect on the way we, statistics practitioners, think about data and statistical methodology. However, the possibilities and challenges introduced by recent technological developments and the general ways we use computing conflict with the computational model of these systems. This article explores some of these challenges and identifies the need to support easy integration of functionality from other domains, and to export statistical methodology to other audiences and applications, both statically and dynamically. Existing systems can be improved in these domains with some already implemented extensions (see Section 5). However, the development of a new environment and computational model that exploits modern tools designed to handle many general aspects of these challenges appears more promising as a long-term approach. We present the architecture for such a new model named Omegahat. It lends itself to entirely new statistical computing paradigms. It is highly extensible at both the user and programmer level, and also encourages the development of new environments for different user groups. The Omegahat interactive language offers a continuity between the different programming tasks and levels via optional type checking and seamless access between the interpreted user language and the implementation language, Java. Parallel and distributed computing, network and database access, interactive graphics, and many other aspects of statistical computing are directly accessible to the user as a consequence of this seamless access. We describe the benefits of using Java as the implementation language for the environment and several innovative features of the user-level language which promise to assist development of software that can be used in many contexts. We also outline how this architecture can be integrated with existing environments such as R and S.

The ideas are drawn from work within the Omega Project for Statistical Computing. The project provides open-source software for researching and developing next generation statistical computing tools.  相似文献   

3.
Summary  Jasp is an experimental general purpose Java-based statistical system which adopts several new computing technologies. It has a function-based and object-oriented language, an advanced user interface, flexible extensibility and a server/client architecture with distributed computing abilities. DAVIS is, on the other hand, a stand-alone Java-based system, and is designed for providing advanced data visualization functions with easy operations by a GUI. In this paper, it is made possible to use tools of DAVIS from within Jasp, in order that the new integrated system can handle not only data filtering and statistical analysis but also data visualization. We develop a mechanism for extending the server/client system of Jasp to realize an efficient collaboration with DAVIS in the client-side. It is shown that the mechanism is straightforward and simple.  相似文献   

4.
The family of expectation--maximization (EM) algorithms provides a general approach to fitting flexible models for large and complex data. The expectation (E) step of EM-type algorithms is time-consuming in massive data applications because it requires multiple passes through the full data. We address this problem by proposing an asynchronous and distributed generalization of the EM called the distributed EM (DEM). Using DEM, existing EM-type algorithms are easily extended to massive data settings by exploiting the divide-and-conquer technique and widely available computing power, such as grid computing. The DEM algorithm reserves two groups of computing processes called workers and managers for performing the E step and the maximization step (M step), respectively. The samples are randomly partitioned into a large number of disjoint subsets and are stored on the worker processes. The E step of DEM algorithm is performed in parallel on all the workers, and every worker communicates its results to the managers at the end of local E step. The managers perform the M step after they have received results from a γ-fraction of the workers, where γ is a fixed constant in (0, 1]. The sequence of parameter estimates generated by the DEM algorithm retains the attractive properties of EM: convergence of the sequence of parameter estimates to a local mode and linear global rate of convergence. Across diverse simulations focused on linear mixed-effects models, the DEM algorithm is significantly faster than competing EM-type algorithms while having a similar accuracy. The DEM algorithm maintains its superior empirical performance on a movie ratings database consisting of 10 million ratings. Supplementary material for this article is available online.  相似文献   

5.
The Hop-constrained Steiner Tree Problem is often used to model applications of multicast routing with QoS requirements. This paper introduces a distributed heuristic for the problem based on the application of dual ascent over a graph transformation introduced by Gouveia et al. The proposed algorithm is shown to yield significantly better solutions than the previously known algorithms.  相似文献   

6.
Multiobjective linear programming algorithms are typically based on value maximization. However, there is a growing body of experimental evidence showing that decision maker behavior is inconsistent with value maximization. Tversky and Simonson provide an alternative model for problems with a discrete set of choices. Their model, called the componential context model, has been shown to capture observed decision maker behavior. In this paper, an interactive multiobjective linear programming algorithm is developed which follows the rationale of Tversky and Simonson. The algorithm is illustrated with an example solved using standard linear programming software. Finally, an interactive decision support system based on this algorithm is developed to field test the usefulness of the algorithm. Results show that this algorithm compares favorably with an established algorithm in the field.  相似文献   

7.
Modern computing clusters have complex interconnections between processors. The complete specification of a communication network is often insufficiently detailed. Methods for automatically refining specification are considered, which use algorithms for reconstructing the topology of connections between processors from results of benchmarking of the cluster communication network. Methods for 3D visualizing the cluster topology and for visually comparing the constructed topology with the template topology provided by the specification are also considered. The approach was tested on systems with the BlueGene P architecture and on the Lomonosov supercomputer. A greedy algorithm for retrieving the topology and an adaptation of the multidimensional scaling method for visualization are designed.  相似文献   

8.
Particle methods are a powerful tool to model dynamic systems. Thereby, the system is discretized by a large number of particles, which are interacting via local, predefined particle-particle interaction laws. The resulting computational effort includes neighborhood search, computation of interaction forces and state update via time integration. Particle methods are used in a lot of different fields of applications like computer science, physics and engineering sciences. As the analyzed systems' number of particles constantly grow, performance enhancement has become an important part of present algorithm development. Besides the well-established approach of algorithm parallelization on multi-core CPUs or CPU clusters, modern graphics processing units (GPUs) present a different and trend-setting possibility for massive parallelization even on desktop computers. Among the top four supercomputers of the world, three are already using NVIDIA GPUs. In late 2006, NVIDIA introduced the first GPUs optimized for general purpose calculations. This was followed by the introduction of a new computing architecture differing from the standard graphics user-interface like OpenGL. This architecture is called Compute Unified Device Architecture (CUDA). It enables the user to program the GPU using standard C commands with few additional runtime functions. The differences in architecture between CPU and GPU result in a completely different algorithm implementation. So, a performance evaluation of different types of particle systems implemented on a GPU using CUDA and on a standard CPU is presented. (© 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

9.
This paper presents a theory which allows the systematic synthesis of a class of iterative algorithms. The basic idea consists of using a specially structured model called thep-algorithm and a set of transformations. These transformations are chosen so that if thep-algorithm solves a given problem, then the new algorithm obtained by repeated application of these transformations still solves the problem. One of the applications of the approach is the synthesis of algorithms which satisfy a-priori given specifications. Given a set of primitives, one may transform an algorithm which uses primitives not in the allowable set into an algorithm which uses only allowable primitives. The theory is illustrated by the systematic transformation of a well known algorithm for optimization, the Frank-Wolfe algorithm.  相似文献   

10.
Graph theory is widely used in numerous fields, such as, engineering, physics, social and biological sciences; linguistics etc. The minimum dominating set (MDS) problem is one of the main problems of algorithmic graph theory and has numerous applications especially in graph mining. Since it is NP-hard to solve the MDS problem approximately, much work has been dedicated to central and distributed approximation algorithms for restricted graph classes. In recent research exponential time \(O(k^{n})\) algorithms are used for some graph classes for solving the MDS problem. In the approach of using the algorithmic tile self-assembly model, the MDS problem has been solved in \(O(n^{2})\) steps. On the other hand, in the area of membrane computing, P systems introduce two levels of parallelism: every membrane works concurrently with other membranes,and, rules are applied in parallel in each membrane. This paper introduces an algorithm based on the parallelism feature of the P systems model for solving the MDS problem in linear time O(n).  相似文献   

11.
This paper describes the k-means range algorithm, a combination of the partitional k-means clustering algorithm with a well known spatial data structure, namely the range tree, which allows fast range searches. It offers a real-time solution for the development of distributed interactive decision aids in e-commerce since it allows the consumer to model his preferences along multiple dimensions, search for product information, and then produce the data clusters of the products retrieved to enhance his purchase decisions. This paper also discusses the implications and advantages of this approach in the development of on-line shopping environments and consumer decision aids in traditional and mobile e-commerce applications.  相似文献   

12.
The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data available from numerous sources. Nonnegative matrix factorization (NMF) has proven to be a successful method for cluster and topic discovery in unlabeled data sets. In this paper, we propose a fast algorithm for computing NMF using a divide-and-conquer strategy, called DC-NMF. Given an input matrix where the columns represent data items, we build a binary tree structure of the data items using a recently-proposed efficient algorithm for computing rank-2 NMF, and then gather information from the tree to initialize the rank-k NMF, which needs only a few iterations to reach a desired solution. We also investigate various criteria for selecting the node to split when growing the tree. We demonstrate the scalability of our algorithm for computing general rank-k NMF as well as its effectiveness in clustering and topic modeling for large-scale text data sets, by comparing it to other frequently utilized state-of-the-art algorithms. The value of the proposed approach lies in the highly efficient and accurate method for initializing rank-k NMF and the scalability achieved from the divide-and-conquer approach of the algorithm and properties of rank-2 NMF. In summary, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains along with an open-source software library called SmallK.  相似文献   

13.
Distributed consensus optimization has received considerable attention in recent years and several distributed consensus-based algorithms have been proposed for (nonsmooth) convex and (smooth) nonconvex objective functions. However, the behavior of these distributed algorithms on nonconvex, nonsmooth and stochastic objective functions is not understood. Such class of functions and distributed setting are motivated by several applications, including problems in machine learning and signal processing. This paper presents the first convergence analysis of the decentralized stochastic subgradient method for such classes of problems, over networks modeled as undirected, fixed, graphs.  相似文献   

14.
In this note we present three efficient variations of the occurrence heuristic, adopted by many exact string matching algorithms and first introduced in the well-known Boyer–Moore algorithm. Our first heuristic, called improved-occurrence heuristic, is a simple improvement of the rule introduced by Sunday in his Quick-Search algorithm. Our second heuristic, called worst-occurrence heuristic, achieves its speed-up by selecting the relative position which yields the largest average advancement. Finally, our third heuristic, called jumping-occurrence heuristic, uses two characters for computing the next shift. Setting the distance between these two characters optimally allows one to maximize the average advancement. The worst-occurrence and jumping-occurrence heuristics tune their parameters according to the distribution of the characters in the text. Experimental results show that the newly proposed heuristics achieve very good results on average, especially in the case of small alphabets.  相似文献   

15.
This paper proposes a novel distributed differential evolution algorithm called Distributed Mixed Variant Differential Evolution (dmvDE). To alleviate the time consuming trial-and-error selection of appropriate Differential Evolution (DE) variant to solve a given optimization problem, dmvDE proposes to mix effective DE variants with diverse characteristics in a distributed framework. The novelty of dmvDEs lies in mixing different DE variants in an island based distributed framework. The 19 dmvDE algorithms, discussed in this paper, constitute various proportions and combinations of four DE variants (DE/rand/1/bin, DE/rand/2/bin, DE/best/2/bin and DE/rand-to-best/1/bin) as subpopulations with each variant evolving independently but also exchanging information amongst others to co-operatively enhance the efficacy of the distributed DE as a whole. The dmvDE algorithms have been run on a set of test problems and compared to the distributed versions of the constituent DE variants. Simulation results show that dmvDEs display a consistent overall improvement in performance than that of distributed DEs. The best of dmvDE algorithms has also been benchmarked against five distributed differential evolution algorithms. Simulation results reiterate the superior performance of the mixing of the DE variants in a distributed frame work. The best of dmvDE algorithms outperforms, on average, all five algorithms considered.  相似文献   

16.
Integrated computing systems—such as Maple, Mathematica, and MATLAB—enable the development of “live” electronic documents that include explanatory text, calculations, custom programming (code development), visualization, and other features. As a result, teachers and students, researchers and practitioners can develop applications in a completely interactive format. Such e-documents can be put to good use in developing textbooks, lecture notes, assignments and presentations, as well as in the context of research and development (R&D) projects. The interactive approach accelerates and enhances the process of learning and research. To illustrate this approach, we discuss a nonlinear (global and local) optimization software product and a topical electronic book that support interactive model development and optimization in Maple. We highlight the key features of the e-book and the software, present illustrative examples, and point towards a range of scientific and engineering applications.   相似文献   

17.
The L 1-median is a robust estimator of multivariate location with good statistical properties. Several algorithms for computing the L 1-median are available. Problem specific algorithms can be used, but also general optimization routines. The aim is to compare different algorithms with respect to their precision and runtime. This is possible because all considered algorithms have been implemented in a standardized manner in the open source environment R. In most situations, the algorithm based on the optimization routine NLM (non-linear minimization) clearly outperforms other approaches. Its low computation time makes applications for large and high-dimensional data feasible.  相似文献   

18.
The Bernstein–Sato polynomial of a hypersurface is an important object with many applications. However, its computation is hard, as a number of open questions and challenges indicate. In this paper we propose a family of algorithms called checkRoot for optimized checking whether a given rational number is a root of Bernstein–Sato polynomial and in the affirmative case, computing its multiplicity. These algorithms are used in the new approach to compute the global or local Bernstein–Sato polynomial and b-function of a holonomic ideal with respect to a weight vector. They can be applied in numerous situations, where a multiple of the Bernstein–Sato polynomial can be established. Namely, a multiple can be obtained by means of embedded resolution, for topologically equivalent singularities or using the formula of A?Campo and spectral numbers. We also present approaches to the logarithmic comparison problem and the intersection homology D-module. Several applications are presented as well as solutions to some challenges which were intractable with the classical methods. One of the main applications is the computation of a stratification of affine space with the local b-function being constant on each stratum. Notably, the algorithm we propose does not employ primary decomposition. Our results can be also applied for the computation of Bernstein–Sato polynomials for varieties. The examples in the paper have been computed with our implementation of the methods described in Singular:Plural.  相似文献   

19.
Quantum-inspired evolutionary algorithms: a survey and empirical study   总被引:3,自引:0,他引:3  
Quantum-inspired evolutionary algorithms, one of the three main research areas related to the complex interaction between quantum computing and evolutionary algorithms, are receiving renewed attention. A quantum-inspired evolutionary algorithm is a new evolutionary algorithm for a classical computer rather than for quantum mechanical hardware. This paper provides a unified framework and a comprehensive survey of recent work in this rapidly growing field. After introducing of the main concepts behind quantum-inspired evolutionary algorithms, we present the key ideas related to the multitude of quantum-inspired evolutionary algorithms, sketch the differences between them, survey theoretical developments and applications that range from combinatorial optimizations to numerical optimizations, and compare the advantages and limitations of these various methods. Finally, a small comparative study is conducted to evaluate the performances of different types of quantum-inspired evolutionary algorithms and conclusions are drawn about some of the most promising future research developments in this area.  相似文献   

20.
Computing with words introduced by Zadeh becomes a very important concept in processing of knowledge represented in the form of propositions. Two aspects of this concept – approximation and personalization – are essential to the process of building intelligent systems for human-centric computing.For the last several years, Artificial Intelligence community has used ontology as a means for representing knowledge. Recently, the development of a new Internet paradigm – the Semantic Web – has led to introduction of another form of ontology. It allows for defining concepts, identifying relationships among these concepts, and representing concrete information. In other words, an ontology has become a very powerful way of representing not only information but also its semantics.The paper proposes an application of ontology, in the sense of the Semantic Web, for development of computing with words based systems capable of performing operations on propositions including their semantics. The ontology-based approach is very flexible and provides a rich environment for expressing different types of information including perceptions. It also provides a simple way of personalization of propositions. An architecture of computing with words based system is proposed. A prototype of such a system is described.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号