首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The support vector machine (SVM) is a powerful learning algorithm, e.g., for classification and clustering tasks, that works even for complex data structures such as strings, trees, lists and general graphs. It is based on the usage of a kernel function for measuring scalar products between data units. For analyzing string data Lodhi et al. (J Mach Learn Res 2:419–444, 2002) have introduced a String Subsequence kernel (SSK). In this paper we propose an approximation to SSK based on dropping higher orders terms (i.e., subsequences which are spread out more than a certain threshold) that reduces the computational burden of SSK. As we are also concerned with practical application of complex kernels with high computational complexity and memory consumption, we provide an empirical model to predict runtime and memory of the approximation as well as the original SSK, based on easily measurable properties of input data. We provide extensive results on the properties of the proposed approximation, SSK-LP, with respect to prediction accuracy, runtime and memory consumption. Using some real-life datasets of text mining tasks, we show that models based on SSK and SSK-LP perform similarly for a set of real-life learning tasks, and that the empirical runtime model is also useful in roughly determining total learning time for a SVM using either kernel.  相似文献   

2.
The use of the fast Fourier transform (FFT) accelerates Lanczos tridiagonalisation method for Hankel and Toeplitz matrices by reducing the complexity of matrix–vector multiplication. In multiprecision arithmetics, the FFT has overheads that make it less competitive compared with alternative methods when the accuracy is over 10000 decimal places. We studied two alternative Hankel matrix–vector multiplication methods based on multiprecision number decomposition and recursive Karatsuba‐like multiplication, respectively. The first method was uncompetitive because of huge precision losses, while the second turned out to be five to 14 times faster than FFT in the ranges of matrix sizes up to n = 8192 and working precision of b = 32768 bits we were interested in. We successfully applied our approach to eigenvalues calculations to studies of spectra of matrices that arise in research on Riemann zeta function. The recursive matrix–vector multiplication significantly outperformed both the FFT and the traditional multiplication in these studies. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
This paper presents a detailed analysis of the scalability and parallelization of Local Search algorithms for constraint-based and SAT (Boolean satisfiability) solvers. We propose a framework to estimate the parallel performance of a given algorithm by analyzing the runtime behavior of its sequential version. Indeed, by approximating the runtime distribution of the sequential process with statistical methods, the runtime behavior of the parallel process can be predicted by a model based on order statistics. We apply this approach to study the parallel performance of a constraint-based Local Search solver (Adaptive Search), two SAT Local Search solvers (namely Sparrow and CCASAT), and a propagation-based constraint solver (Gecode, with a random labeling heuristic). We compare the performance predicted by our model to actual parallel implementations of those methods using up to 384 processes. We show that the model is accurate and predicts performance close to the empirical data. Moreover, as we study different types of problems, we observe that the experimented solvers exhibit different behaviors and that their runtime distributions can be approximated by two types of distributions: exponential (shifted and non-shifted) and lognormal. Our results show that the proposed framework estimates the runtime of the parallel algorithm with an average discrepancy of 21 % w.r.t. the empirical data across all the experiments with the maximum allowed number of processors for each technique.  相似文献   

4.
We show that the polar as well as the pseudo-polar FFT can be computed very accurately and efficiently by the well-known nonequispaced FFT. Furthermore, we discuss the reconstruction of a 2d signal from its Fourier transform samples on a (pseudo)polar grid by means of the inverse nonequispaced FFT.  相似文献   

5.
It is well-known that Bi-CG can be adapted so that hybrid methods with computational complexity almost similar to Bi-CG can be constructed, in which it is attempted to further improve the convergence behavior. In this paper we will study the class of BiCGstab methods.In many applications, the speed of convergence of these methods appears to be determined mainly by the incorporated Bi-CG process, and the problem is that the Bi-CG iteration coefficients have to be determined from the BiCGstab process. We will focus our attention to the accuracy of these Bi-CG coefficients, and how rounding errors may affect the speed of convergence of the BiCGstab methods. We will propose a strategy for a more stable determination of the Bi-CG iteration coefficients and by experiments we will show that this indeed may lead to faster convergence.  相似文献   

6.
Nowadays, with the volume of data growing at an unprecedented rate, large-scale data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel large-scale rough set based methods for knowledge acquisition using MapReduce in this paper. We implemented them on several representative MapReduce runtime systems: Hadoop, Phoenix and Twister. Performance comparisons on these runtime systems are reported in this paper. The experimental results show that (1) The computational time is mostly minimum on Twister while employing the same cores; (2) Hadoop has the best speedup for larger data sets; (3) Phoenix has the best speedup for smaller data sets. The excellent speedups also demonstrate that the proposed parallel methods can effectively process very large data on different runtime systems. Pitfalls and advantages of these runtime systems are also illustrated through our experiments, which are helpful for users to decide which runtime system should be used in their applications.  相似文献   

7.
The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis – also known as the Slepian basis – this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructions for computing approximate projections onto the leading Slepian basis elements. The complexity of the resulting algorithms is comparable to the FFT, and scales favorably as the quality of the desired approximation is increased. In the process of bounding the complexity of these algorithms, we also establish new nonasymptotic results on the eigenvalue distribution of discrete time–frequency localization operators. We then demonstrate how these algorithms allow us to efficiently compute the solution to certain least-squares problems that arise in signal processing. We also provide simulations comparing these fast, approximate Slepian methods to exact Slepian methods as well as the traditional FFT based methods.  相似文献   

8.
A novel method is proposed to compute the Bayes estimate for a logistic Gaussian process prior for density estimation. The method gains speed by drawing samples from the posterior of a finite-dimensional surrogate prior, which is obtained by imputation of the underlying Gaussian process. We establish that imputation results in quite accurate computation. Simulation studies show that accuracy and high speed can be combined. This fact, along with known flexibility of the logistic Gaussian priors for modeling smoothness and recent results on their large support, makes these priors and the resulting density estimate very attractive.  相似文献   

9.
The large claims reinsurance treaties ECOMOR and LCR are well known not to be very popular. They have been largely neglected by most reinsurers because of their technical complexity. In this paper, we derive new mathematical results connected to asymptotic problems of these reinsurance forms. Perhaps these results can reopen the discussion on the usefulness of including the largest claims in the decision making procedure. Apart from asymptotic estimates for the tail of the distribution of the ECOMOR-quantity, we find its weak laws. We also deal with the weak laws of the LCR-quantity. Finally, we illustrate the outcomes with a number of simulations.  相似文献   

10.
Managing and hedging the risks associated with Variable Annuity (VA) products require intraday valuation of key risk metrics for these products. The complex structure of VA products and computational complexity of their accurate evaluation have compelled insurance companies to adopt Monte Carlo (MC) simulations to value their large portfolios of VA products. Because the MC simulations are computationally demanding, especially for intraday valuations, insurance companies need more efficient valuation techniques. Recently, a framework based on traditional spatial interpolation techniques has been proposed that can significantly decrease the computational complexity of MC simulation (Gan and Lin, 2015). However, traditional interpolation techniques require the definition of a distance function that can significantly impact their accuracy. Moreover, none of the traditional spatial interpolation techniques provide all of the key properties of accuracy, efficiency, and granularity (Hejazi et al., 2015). In this paper, we present a neural network approach for the spatial interpolation framework that affords an efficient way to find an effective distance function. The proposed approach is accurate, efficient, and provides an accurate granular view of the input portfolio. Our numerical experiments illustrate the superiority of the performance of the proposed neural network approach compared to the traditional spatial interpolation schemes.  相似文献   

11.
The inexact GMRES algorithm is a variant of the GMRES algorithm where matrix-vector products are performed inexactly, either out of necessity or deliberately, as part of a trading of accuracy for speed. Recent studies have shown that relaxing matrix-vector products in this way can be justified theoretically and experimentally. Research, so far, has focused on decreasing the workload per iteration without significantly affecting the accuracy. But relaxing the accuracy per iteration is liable to increase the number of iterations, thereby increasing the overall runtime, which could potentially end up being greater than that of the exact GMRES if there were not enough savings in the matrix-vector products. In this paper, we assess the benefit of the inexact approach in terms of actual CPU time derived from realistic problems, and we provide cases that provide instructive insights into results affected by the build-up of the inexactness. Such information is of vital importance to practitioners who need to decide whether switching their workflow to the inexact approach is worth the effort and the risk that might come with it. Our assessment is drawn from extensive numerical experiments that gauge the effectiveness of the inexact scheme and its suitability for use in addressing certain problems, depending on how much inexactness is allowed in the matrix-vector products.  相似文献   

12.
In this paper we study the asymptotics of the tail of the buffer occupancy distribution in buffers accessed by a large number of stationary independent sources and which are served according to a strict HOL priority rule. As in the case of single buffers, the results are valid for a very general class of sources which include long-range dependent sources with bounded instantaneous rates. We first consider the case of two buffers with one of them having strict priority over the other and we obtain asymptotic upper bound for the buffer tail probability for lower priority buffer. We discuss the conditions to have asymptotic equivalents. The asymptotics are studied in terms of a scaling parameter which reflects the server speed, buffer level and the number of sources in such a way that the ratios remain constant. The results are then generalized to the case of M buffers which leads to the source pooling idea. We conclude with numerical validation of our formulae against simulations which show that the asymptotic bounds are tight. We also show that the commonly suggested reduced service rate approximation can give extremely low estimates.  相似文献   

13.
Kernelized support vector machines (SVMs) belong to the most widely used classification methods. However, in contrast to linear SVMs, the computation time required to train such a machine becomes a bottleneck when facing large data sets. In order to mitigate this shortcoming of kernel SVMs, many approximate training algorithms were developed. While most of these methods claim to be much faster than the state-of-the-art solver LIBSVM, a thorough comparative study is missing. We aim to fill this gap. We choose several well-known approximate SVM solvers and compare their performance on a number of large benchmark data sets. Our focus is to analyze the trade-off between prediction error and runtime for different learning and accuracy parameter settings. This includes simple subsampling of the data, the poor-man’s approach to handling large scale problems. We employ model-based multi-objective optimization, which allows us to tune the parameters of learning machine and solver over the full range of accuracy/runtime trade-offs. We analyze (differences between) solvers by studying and comparing the Pareto fronts formed by the two objectives classification error and training time. Unsurprisingly, given more runtime most solvers are able to find more accurate solutions, i.e., achieve a higher prediction accuracy. It turns out that LIBSVM with subsampling of the data is a strong baseline. Some solvers systematically outperform others, which allows us to give concrete recommendations of when to use which solver.  相似文献   

14.
Derivatives are popular financial instruments whose values depend on other more fundamental financial assets (called the underlying assets). As they play essential roles in financial markets, evaluating them efficiently and accurately is critical. Most derivatives have no simple valuation formulas; as a result, they must be priced by numerical methods such as lattice methods. In a lattice, the prices of the derivatives converge to theoretical values when the number of time steps increases. Unfortunately, the nonlinearity error introduced by the nonlinearity of the option value function may cause the pricing results to converge slowly or even oscillate significantly. The lognormal diffusion process, which has been widely used to model the underlying asset’s price dynamics, does not capture the empirical findings satisfactorily. Therefore, many alternative processes have been proposed, and a very popular one is the jump-diffusion process. This paper proposes an accurate and efficient lattice for the jump-diffusion process. Our lattice is accurate because its structure can suit the derivatives’ specifications so that the pricing results converge smoothly. To our knowledge, no other lattices for the jump-diffusion process have successfully solved the oscillation problem. In addition, the time complexity of our lattice is lower than those of existing lattice methods by at least half an order. Numerous numerical calculations confirm the superior performance of our lattice to existing methods in terms of accuracy, speed, and generality.  相似文献   

15.
We obtain marginal tail area approximations for the one-dimensional test statistic based on the appropriate component of the M-estimate for both standardized and Studentized versions which are needed for tests and confidence intervals. The result is proved under conditions which allow the application to finite sample situations such as the bootstrap and involves a careful discretization with saddlepoints being used for each neighbourhood. These results are used to obtain second-order relative error results on the accuracy of the Studentized and the tilted bootstrap. The tail area approximations are applied to a Poisson regression model and shown to have very good accuracy. An erratum to this article can be found at  相似文献   

16.
In this paper, a conservative compact difference scheme is proposed for the two‐dimensional nonlinear Zakharov equation with periodic boundary condition and initial condition. The proposed scheme not only conserve the mass and energy in the discrete level but also are efficient in practical experiments because the Fast Fourier transform (FFT) can be used to speed up the numerical computation. By using the standard energy method and induction argument, we can establish rigorously the unconditional and optimal H2‐error estimates. Some numerical examples are provided to support our theoretical results and show the accuracy and efficiency of the new scheme.  相似文献   

17.
Considering the space-time adaptive method for parabolic evolution equations we introduced in Stevenson et al., this work discusses an implementation of the method in which every step is of linear complexity. Exploiting the tensor-product structure of the space-time cylinder, the method allows for a family of trial spaces given as spans of wavelets-in-time tensorized with finite element spaces-in-space. On spaces whose bases are indexed by double-trees, we derive an algorithm that applies the resulting bilinear forms in linear complexity. We provide extensive numerical experiments to demonstrate the linear runtime of the resulting adaptive loop.  相似文献   

18.
Disruptions in airline operations can result in infeasibilities in aircraft and passenger schedules. Airlines typically recover aircraft schedules and disruptions in passenger itineraries sequentially. However, passengers are severely affected by disruptions and recovery decisions. In this paper, we present a mathematical formulation for the integrated aircraft and passenger recovery problem that considers aircraft and passenger related costs simultaneously. Using the superimposition of aircraft and passenger itinerary networks, passengers are explicitly modeled in order to use realistic passenger related costs. In addition to the common routing recovery actions, we integrate several passenger recovery actions and cruise speed control in our solution approach. Cruise speed control is a very beneficial action for mitigating delays. On the other hand, it adds complexity to the problem due to the nonlinearity in fuel cost function. The problem is formulated as a mixed integer nonlinear programming (MINLP) model. We show that the problem can be reformulated as conic quadratic mixed integer programming (CQMIP) problem which can be solved with commercial optimization software such as IBM ILOG CPLEX. Our computational experiments have shown that we could handle several simultaneous disruptions optimally on a four-hub network of a major U.S. airline within less than a minute on the average. We conclude that proposed approach is able to find optimal tradeoff between operating and passenger-related costs in real time.  相似文献   

19.
Fast algorithms for the accurate evaluation of some singular integral operators that arise in the context of solving certain partial differential equations within the unit circle in the complex plane are presented. These algorithms are generalizations and extensions of a fast algorithm of Daripa [11]. They are based on some recursive relations in Fourier space and the FFT (Fast Fourier Transform), and have theoretical computational complexity of the order O(N) per point, where N2 is the total number of grid points. An application of these algorithms to quasiconformal mappings of doubly connected domains onto annuli is presented in a follow-up paper.  相似文献   

20.
Image segmentation is required as a very important and fundamental operation for significant analysis and interpretation of images. One of the most important applications of segmentation is for facial surgical planning. Thresholding method is so common in image segmentation, because it is simple, noise robustness and accurate. In this paper, we recognize and segment the area of lips using optimal thresholding based on bacterial foraging optimization. New color space (IHLS) is introduced in this paper, that it has good performance in facial image segmentation. In order to evaluate the performance of the proposed algorithm, we use three methods to measure accuracy. The proposed algorithm has less computational complexity and error and it is also efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号