首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Random Databases with Approximate Record Matching
Authors:Oleg Seleznjev  Bernhard Thalheim
Institution:1.Department of Mathematics and Mathematical Statistics,Ume? University,Ume?,Sweden;2.Faculty of Mathematics and Mechanics,Moscow State University,Moscow,Russia;3.Institute of Computer Science and Applied Mathematics,Christian-Albrechts University,Kiel,Germany
Abstract:In many database applications in telecommunication, environmental and health sciences, bioinformatics, physics, and econometrics, real-world data are uncertain and subjected to errors. These data are processed, transmitted and stored in large databases. We consider stochastic modelling for databases with uncertain data and for some basic database operations (for example, join, selection) with exact and approximate matching. Approximate join is used for merging or data deduplication in large databases. Distribution and mean of the join sizes are studied for random databases. A random database is treated as a table with independent random records with a common distribution (or a set of random tables). These results can be used for integration of information from different databases, multiple join optimization, and various probabilistic algorithms for structured random data.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号