首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 93 毫秒
1.
为应对分布式大数据对传统统计建模分析带来的巨大挑战,考虑Expec tile回归模型以实现基于分布式大数据的有效数据处理和统计推断.其新颖之处在于对分布式存储于每台机器中的数据,分别应用Expectile回归,再通过平均方法聚合这些回归结果并进行综合推断.在算法上,考虑在处理大数据计算中热门的交替方向乘子算法(ADMM)基础上,提出了分块ADMM算法,该迭代算法易于并行计算,结果稳健,而且可以显著减少存储大数据所需的容量.不仅基于分布式大数据的Expectile回归模型的参数估计具有良好的有效性和渐近性质,而且数值模拟和实证分析也都验证了该方法在处理分布式大数据时的有效性.  相似文献   

2.
研究数据集被分割并存储于不同处理器时的特征提取和变量选择问题,其中处理器通过某种网络结构相互连接.提出分布式L_(1/2)正则化方法,基于ADMM算法给出分布式L_(1/2)正则化算法,证明了算法的收敛性.算法通过相邻处理器之间完成信息交互,其变量选择结果与数据集不分割时利用L_(1/2)正则化相同.实验表明,所提出的新算法有效、实用,适合于分布式存储数据处理.  相似文献   

3.
在分析输电线路状态监测系统特点的基础上,提出了在系统中引入云计算存储与并行处理技术的设计方案,将关系型数据库与开源的Hadoop云计算平台结合使用,解决了关系型数据库在系统使用中存储和访问效率等方面的问题.介绍了所开发的原型系统提供的服务及其主要功能,并针对系统中的典型应用进行了性能测试.测试结果表明所提方案可以满足输电线路状态监测系统对数据存储与读取、分析的性能要求.  相似文献   

4.
为解决传统的支持向量回归模型在处理大规模数据时计算效率较低的局限,文章将交互有效方法与支持向量回归模型相结合,提出了基于交互有效方法的分布式支持向量回归模型(CE-SVR).该模型首先采用分布式存储方式将大规模数据随机分配给多台机器,其次采用交互有效方法构建支持向量回归的近似损失函数替代全局损失函数获得近似预测结果,能够有效地分析大规模数据.数值模拟和应用研究的结果表明:在线性模型中,文章所提出模型的预测性能与全局支持向量回归模型基本一致,且显著优于基于单轮型方法的分布式支持向量回归模型(OS-SVR);在非线性模型中,文章所提出模型的预测性能会随着机器数的增加而降低,但其预测性能显著优于OS-SVR模型.  相似文献   

5.
大数据的分布式统计学习引起了人们的广泛关注.已有方法存在两个显著问题.首先,它们都要求大数据以随机的方式存储在不同机器上,而这一点在实际中很难满足.其次,它们大都基于最小二乘,对重尾误差和异常值很敏感.为了解决这些问题,本文提出一个稳健的分布式众数回归,并且将其应用到非凸惩罚变量选择中.新方法克服了已有方法所需要的非随机分布假定,而且理论结果也证实了这个论断.随机模拟和实际数据分析也展示了新方法的良好表现.  相似文献   

6.
结构化查询语言SPARQL支持对RDF数据的准确查询,但它需要用户了解RDF数据模式和查询语法.关键字搜索在可用性方面明显优于结构化查询,但容易因语义模糊性造成搜索空间巨大.利用RDF谓词信息来扩展查询,通过一个RDF图上关键字搜索的互动过程,允许用户通过选择一些谓词来限制查询的语义,以减少关键字的模糊性.  相似文献   

7.
对XML数据建立有效的索引,是左右XML数据处理性能的重要因素.现有的索引和存储策略,大部分以丢失结构信息为代价,不利于结构查询和更新.XMLSchema作为描述XML文档结构信息的标准,对文档和查询路径进行有效性验证提供保证,基于此提出了一种基于XMLmSchema模式约束的XML文档数据索引技术SBXI,用于文档数据存储和查询的导航,提高了存储和查询效率,具有较高的空间利用率和较低的索引维护代价,并支持含有多个谓词的复杂查询.  相似文献   

8.
云计算和大数据已成为IT领域的研究热点,如何将云计算在数据存储和数据处理方面的优势应用于大数据领域具有重要的实际应用价值.开源的云平台OpenStack可方便地从硬件管理方面构建私有云,其存储模块Swift能够支持PB级的大数据存储.开源的云平台Hadoop在数据处理方面具有很强的优势,但在支持超大数据存储方面存在不足.通过对OpenStack中的存储模块Swift和Hadoop中的文件处理模块HDFS的比较分析,提出了将Swift和Hadoop的MapReduce技术结合来构建企业处理大数据的私有云计算系统方案.分析结果显示该方案是可行的,这种异构的私有云系统可以整合不同云计算平台各自的优势进行高效的大数据处理.  相似文献   

9.
在大数据分析中,由于数据量巨大,储存于不同的机器中,常用的统计分析方法不能直接适用.因此需要对数据进行分布式计算.无论是分而治之还是多中心数据都需要对数据或计算中间结果进行传输.传输中不仅需要对数据进行隐私保护,也需要保证传输的高效性,同时传输次数过多不仅影响计算的效率,对数据的隐私保护也更有挑战.受此启发,本文在差分隐私模型下,提出了用于高效通讯的分布式参数估计算法中的隐私保护方案,并且严格证明了该方案既能有效保护数据安全,又不影响参数估计的有效性.最后,本文就线性模型下基于差分隐私保护算法的参数估计进行了模拟和实例验证.  相似文献   

10.
信息技术的迅速发展以及数据库技术的日新月异,大量的信息成为当代的一个特点.保健药品企业存储了海量的交易数据,通常,面对这样的数据,人们往往无所适从,数据挖掘的方法向我们提供了对海量数据进行加工和处理的工具,基于企业交易海量数据,建立商品层面关联分析模型,并分别对订购商不加时间区分的进行商品关联分析、按月合并订购商订单后进行商品关联分析,此外,还进一步地进行了商品的序列关联分析.  相似文献   

11.
Numerical databases arise in many scientific applications to keep track of large dense and sparse matrices, residing on secondary devices in matrix compact data representation. This paper describes a language-driven generalized data translator for translating any numerical database from one matrix compact data representation to another. Our approach is to describe any matrix compact data representation by a physical schema and any numerical database and its mapping to storage by data language facilities. The languages are processed by a Generalized Syntax-Directed Translation Scheme (GSDTS) to automatically generate FORTRAN conversion programs which become the major modules of the translator.  相似文献   

12.
Large-scale set covering problems are often approached by constructive greedy heuristics, and many selection criteria for such heuristics have been considered. These criteria are typically based on measures of the cost of setting an additional variable to one in relation to the number of yet unfulfilled constraints that it will satisfy. We show how such greedy selections can be performed on column-oriented set covering models, by using a fractional optimization formulation and solving sequences of ordinary column generation problems for the application at hand.  相似文献   

13.
In this paper, a quadratic programming model is developed to take into consideration a number of factors that can influence the process of optimal allocation of data among the nodes in a distributed database. The factors include communication costs, translation costs, congestion costs and storage costs. Beale's method is used to solve the resulting quadratic program. Some numerical examples are presented and the potentials of such an approach in the design and analysis of distributed databases are discussed.This work was partially supported by a grant from Natural Science and Engineering Research Council of Canada.  相似文献   

14.
广义鞍点问题的松弛维数分解预条件子   总被引:1,自引:0,他引:1  
曹阳  谈为伟  蒋美群 《计算数学》2012,34(4):351-360
本文将Benzi等提出的松弛维数分解(Relaxed dimensionalfactorization, RDF)预条件子进一步推广到广义鞍点问题上,并称为GRDF(Generalized RDF)预条件子.该预条件子可看做是用维数分裂迭代法求解广义鞍点问题而导出的改进维数分裂(Modified dimensional split, MDS)预条件子的松弛形式, 它相比MDS预条件子更接近于系数矩阵, 因而结合Krylov子空间方法(如GMRES)有更快的收敛速度.文中分析了GRDF预处理矩阵特征值的一些性质,并用数值算例验证了新预条件子的有效性.  相似文献   

15.
Iterative orthogonalization is aimed to ensure small deviation from orthogonality in the Gram–Schmidt process. Former applications of this technique are restricted to classical Gram–Schmidt (CGS) and column-oriented modified Gram–Schmidt (MGS). The major aim of this paper is to explain how iterative orthogonalization is incorporated into row-oriented MGS. The interest that we have in a row-oriented iterative MGS comes from the observation that this method is capable of performing column pivoting. The use of column pivoting delays the deteriorating effects of rounding errors and helps to handle rank-deficient least-squares problems.

A second modification proposed in this paper considers the use of Gram–Schmidt QR factorization for solving linear least-squares problems. The standard solution method is based on one orthogonalization of the r.h.s. vector b against the columns of Q. The outcome of this process is the residual vector, r*, and the solution vector, x*. The modified scheme is a natural extension of the standard solution method that allows it to apply iterative orthogonalization. This feature ensures accurate computation of small residuals and helps in cases when Q has some deviation from orthogonality.  相似文献   


16.
We propose an algorithm for finding the so-called principal solution of the Sylvester matrix equation over max-plus algebra. The derivation of our algorithm is based on the concept of tropical tensor product introduced by Butkovi? and Fiedler. Our algorithm reduces the computational cost of finding the principal solution from quartic to cubic. It also reduces the space complexity from quartic to quadratic. Since matrix–matrix multiplication is the most important ingredient of our proposed technique, we show how to use column-oriented matrix multiplications in order to speed-up MATLAB implementation of our algorithm. Finally, we illustrate our results and discuss the connection with the residuation theory.  相似文献   

17.
The wide availability of computer technology and large electronic storage media has led to an enormous proliferation of databases in almost every area of human endeavour. This naturally creates an intense demand for powerful methods and tools for data analysis. Current methods and tools are primarily oriented toward extracting numerical and statistical data characteristics. While such characteristics are very important and useful, they are often insufficient. A decision maker typically needs an interpretation of these findings, and this has to be done by a data analyst. With the growth in the amount and complexity of the data, making such interpretations is an increasingly difficult problem. As a potential solution, this paper advocates the development of methods for conceptual data analysis. Such methods aim at semi-automating the processes of determining high-level data interpretations, and discovering qualitative patterns in data. It is argued that these methods could be built on the basis of algorithms developed in the area of machine learning. An exemplary system utilizing such algorithms, INLEN, is discussed. The system integrates machine learning and statistical analysis techniques with database and expert system technologies. Selected capabilities of the system are illustrated by examples from implemented modules.  相似文献   

18.
We begin this paper by identifying a class of stochastic mixed-integer programs that have column-oriented formulations suitable for solution by a branch-and-price algorithm (B&P). We then survey a number of examples, and use a stochastic facility-location problem (SFLP) for a detailed demonstration of the relevant modeling and solution techniques. Computational results with a scenario representation of uncertain costs, demands and capacities show that B&P can be orders of magnitude faster than solving the standard formulation by branch and bound. We also demonstrate how B&P can solve SFLP exactly – as exactly as a deterministic mixed-integer program – when demands and other parameters can be represented as certain types of independent, random variables, e.g., independent, normal random variables with integer means and variances. Kevin Wood thanks the Office of Naval Research, Air Force Office of Scientific Research, the Naval Postgraduate School (NPS) and the University of Auckland for their support. Eduardo Silva thanks NPS and the Brazilian Navy for their support. Both authors are grateful to the COIN-OR team for assistance with computational issues, as well as to two anonymous referees for highly useful, constructive criticism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号