首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
研究了DNA序列片段的查找问题,针对DNA数据量大和DNA序列碱基排列的特点提出了DNA序列检索的问题.在对DNA序列检索中,基于Hash建立了索引表以提高在大数据中检索的速度和效率,同时在平衡树的数据存储模型上使用了改进的Kmp快速匹配算法,提高了在索引上的检索效率.介绍了Hash索引的建立、Kmp的优化以及平衡树的再平衡.利用软件评估实验得出的实验结果表明了该算法的有效性.  相似文献   

2.
信息披露制度是上市公司为保障投资者利益、接受社会公众的监督而依照法律规定必须将其自身的财务变化、经营状况等信息向社会及监管部门公开或公告,以便投资者充分了解情况的制度.XBRL作为一种基于XML的可扩展性商业报告语言,目前已广泛应用于财务信息披露制度中,并逐渐成为了信息披露制度的标准数据格式.对XBRL的规范、分类、实例文档进行研究,基于MapReduce和HDFS提出可用于海量XBRL数据的频繁模式并行挖掘方法,基于我国上市公司的XBRL实例数据进行了实验,取得了良好的效果.  相似文献   

3.
由于RDF数据的爆炸式增长以及关系型数据库自身的瓶颈,使得几年来NOSQL数据库成为人们研究的热点,其中列式数据库尤为突出.在比较了目前流行的RDF存储方案前提下,提出了基于列式数据库的RDF数据分布式存储方案,同时还提出了列式数据库的SPARQL查询方案,最后通过实验对比了基于列式数据库与基于图数据库的RDF分布式数据存储方案的处理性能.  相似文献   

4.
针对自然条件恶劣的野外河道环境下冰情监测系统对采集数据连续性和设备实时控制的需求,基于TCP/IP技术和LabVIEW软件工具设计并研制了冰情远程监测数据处理系统.安装有冰情远程自动监测系统软件的监测中心上位机作为TCP服务器,实现了对远程数据的接收和对远程传感器的控制,利用LabVIEW的编程语言实现了对远程数据的处理,利用LabSQL工具包和Access数据库可以实现对冰情远程数据及后期处理结果的自动存储,并具有历史数据的查询功能,由于系统操作简便,可以大大降低操作人员的工作量.  相似文献   

5.
结构化查询语言SPARQL支持对RDF数据的准确查询,但它需要用户了解RDF数据模式和查询语法.关键字搜索在可用性方面明显优于结构化查询,但容易因语义模糊性造成搜索空间巨大.利用RDF谓词信息来扩展查询,通过一个RDF图上关键字搜索的互动过程,允许用户通过选择一些谓词来限制查询的语义,以减少关键字的模糊性.  相似文献   

6.
信息技术的迅速发展以及数据库技术的日新月异,大量的信息成为当代的一个特点.保健药品企业存储了海量的交易数据,通常,面对这样的数据,人们往往无所适从,数据挖掘的方法向我们提供了对海量数据进行加工和处理的工具,基于企业交易海量数据,建立商品层面关联分析模型,并分别对订购商不加时间区分的进行商品关联分析、按月合并订购商订单后进行商品关联分析,此外,还进一步地进行了商品的序列关联分析.  相似文献   

7.
研究数据集被分割并存储于不同处理器时的特征提取和变量选择问题,其中处理器通过某种网络结构相互连接.提出分布式L_(1/2)正则化方法,基于ADMM算法给出分布式L_(1/2)正则化算法,证明了算法的收敛性.算法通过相邻处理器之间完成信息交互,其变量选择结果与数据集不分割时利用L_(1/2)正则化相同.实验表明,所提出的新算法有效、实用,适合于分布式存储数据处理.  相似文献   

8.
为解决管理信息系统云服务环境下的海量数据存储与利用,充分运用本体理论,概括和归纳各类管理信息的本质共性特征,建立易于人机相互映射理解和计算机大规模、集约化处理的管理信息本体表达体系.同时,按照适用、便捷、低成本等原则,进行云数据库分层架构设计,构建动静态元数据主要标引项,实现其与管理信息本体间的逻辑关系,最终完成基于管理信息本体需求的云服务系统.  相似文献   

9.
为应对分布式大数据对传统统计建模分析带来的巨大挑战,考虑Expec tile回归模型以实现基于分布式大数据的有效数据处理和统计推断.其新颖之处在于对分布式存储于每台机器中的数据,分别应用Expectile回归,再通过平均方法聚合这些回归结果并进行综合推断.在算法上,考虑在处理大数据计算中热门的交替方向乘子算法(ADMM)基础上,提出了分块ADMM算法,该迭代算法易于并行计算,结果稳健,而且可以显著减少存储大数据所需的容量.不仅基于分布式大数据的Expectile回归模型的参数估计具有良好的有效性和渐近性质,而且数值模拟和实证分析也都验证了该方法在处理分布式大数据时的有效性.  相似文献   

10.
生物学研究表明,指纹在胎儿时期发育形成,并且其脊线结构在人的一生中从不改变,除非当指尖处深度擦伤之类的事故发生而导致指纹损伤.指纹的这种特性使得指纹作为生物特征进行身份认证非常有吸引力.指纹自动识别系统包括指纹图像的获取和存储、指纹图像数据的再表达和特征提取、指纹分类和索引、指纹匹配等模块.针对大库容量指纹自动识别系统各个模块中的一些关键技术,建立了最优化模型,设计了快速准确的求解算法,使得指纹自动识别系统的各项指标均能够达到国际先进水平,并应用到我国一些省市和公安部刑侦领域指纹自动识别系统中.  相似文献   

11.
An XML schema is a set of rules for defining the allowed sub-elements of any element in an XML document. These rules use regular expressions to define the language of the element's children. Updates to an XML schema are updates to the regular expressions defined by the schema rules. We consider an interactive, data administration tool for XML databases. In this tool, changes on an XML schema are activated by updates that violate the validity of an XML document. Our schema validator is a Datalog program, resulting from the translation of a given XML schema. Changing the schema implies changing the validator.The main contribution of this paper is an algorithm allowing the evolution of XML schemas. This algorithm is based on the computation of new regular expressions to extend a given regular language in a conservative way, trying to foresee the needs of an application. A translation function from schema constraints to Datalog programs is introduced. The validation of an XML tree corresponds to the evaluation of the Datalog program over the tree. Our method allows the maintenance of the Datalog program in an incremental way, i.e., without redoing the entire translation.  相似文献   

12.
Given two rooted, labeled trees P and T the tree path subsequence problem is to determine which paths in P are subsequences of which paths in T. Here a path begins at the root and ends at a leaf. In this paper we propose this problem as a useful query primitive for XML data, and provide new algorithms improving the previously best known time and space bounds.  相似文献   

13.
14.
XML data is queried with a limited form of regular expressions, in a language called XPath. New XML stream processing applications, such as content-based routing or selective dissemination of information, require thousands or millions of XPath expressions to be evaluated simultaneously on the incoming XML stream at a high, sustained rate. In its simplest approximation, the XPath evaluation problem is analogous to the text search problem, in which one or several regular expressions need to be matched to a given text. At a finer level, it is related to the tree pattern matching problem. However, unlike the traditional setting, the number of regular expressions here is much larger, while the “text” is much shorter, since it corresponds to the depth of the XML stream. In this paper we examine techniques that have been proposed for XML stream processing and describe a few open problems.  相似文献   

15.
Graphs are important structures to model complex relationships such as chemical compounds, proteins, geometric or hierarchical parts, and XML documents. Given a query graph, indexing has become a necessity to retrieve similar graphs quickly from large databases. We propose a novel technique for indexing databases, whose entries can be represented as graph structures. Our method starts by representing the topological structure of a graph as well as that of its subgraphs as vectors in which the components correspond to the sorted laplacian eigenvalues of the graph or subgraphs. By doing a nearest neighbor search around the query spectra, similar but not necessarily isomorphic graphs are retrieved. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

16.
LSI潜在语义信息检索模型   总被引:5,自引:0,他引:5  
本文介绍了基于向量空间的信息检索方法 ,检索词和文件之间的关系表示成一个矩阵 ,查寻信息表示为检索词权重的向量 ,通过求查寻和文件向量的夹角余弦确定出数据库中的相关文件 .使用矩阵的 QR分解和奇异值分解 ( SVD)来处理数据库本身的不确定性 ,本文的目的是说明线性代数中的基本概念可以很好解决信息检索 ( IR)问题  相似文献   

17.
This paper proposes an information retrieval (IR) model based on possibilistic directed networks. The relevance of a document w.r.t a query is interpreted by two degrees: the necessity and the possibility. The necessity degree evaluates the extent to which a given document is relevant to a query, whereas the possibility degree evaluates the reasons of eliminating irrelevant documents. This new interpretation of relevance led us to revisit the term weighting scheme by explicitly distinguishing between informative and non-informative terms in a document. Experiments carried out on three standard TREC collections show the effectiveness of the model.  相似文献   

18.
Storing XML documents in relational databases has drawn much attention in recent years because it can leverage existing investments in relational database technologies. Different algorithms have been proposed to map XML DTD/Schema to relational schema in order to store XML data in relational databases. However, most work defines mapping rules based on heuristics without considering application characteristics, hence fails to produce efficient relational schema for various applications. In this paper, we propose a workload-aware approach to generate relational schema from XML data and user specified workload. Our approach adopts the genetic algorithm to find optimal mappings. An elegant encoding method and related operations are proposed to manipulate mappings using bit strings. Various techniques for optimization can be applied to the XML to relational mapping problem based on this representation. We implemented the proposed algorithm and our experiment results showed that our algorithm was more robust and produced better mappings than existing work.  相似文献   

19.
We present PackLib2, the first fully integrated benchmark library for multi-dimensional packing instances. PackLib2 combines a systematic collection of all benchmark instances from previous literature with a well-organized set of new and challenging large instances. The XML format allows linking basic benchmark data with other important properties, like bibliographic information, origin, best known solutions, runtimes, etc. Transforming instances into a variety of existing input formats is also quite easy, as the XML format lends itself to easy conversion; for this purpose, a number of parsers are provided. Thus, PackLib2 aims at becoming a one-stop location for the packing and cutting community: in addition to fair and easy comparison of algorithmic work and ongoing measurement of scientific progress, it poses numerous challenges for future research.  相似文献   

20.
Since its introduction in 1997, the M-tree became a respected metric access method (MAM), while remaining, together with its descendants, still the only database-friendly MAM, that is, a dynamic structure persistent in paged index. Although there have been many other MAMs developed over the last decade, most of them require either static or expensive indexing. By contrast, the dynamic M-tree construction allows us to index very large databases in subquadratic time, and simultaneously the index can be maintained up-to-date (i.e., supports arbitrary insertions/deletions). In this article we propose two new techniques improving dynamic insertions in M-tree—the forced reinsertion strategies and so-called hybrid-way leaf selection. Both of the techniques preserve logarithmic asymptotic complexity of a single insertion, while they aim to produce more compact M-tree hierarchies (which leads to faster query processing). In particular, the former technique reuses the well-known principle of forced reinsertions, where the new insertion algorithm tries to re-insert the content of an M-tree leaf that is about to split in order to avoid that split. The latter technique constitutes an efficiency-scalable selection of suitable leaf node wherein a new object has to be inserted. In the experiments we show that the proposed techniques bring a clear improvement (speeding up both indexing and query processing) and also provide a tuning tool for indexing vs. querying efficiency trade-off. Moreover, a combination of the new techniques exhibits a synergic effect resulting in the best strategy for dynamic M-tree construction proposed so far.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号