Random Databases with Approximate Record Matching期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Random Databases with Approximate Record Matching

Authors:	Oleg Seleznjev Bernhard Thalheim

Institution:	1.Department of Mathematics and Mathematical Statistics,Ume? University,Ume?,Sweden;2.Faculty of Mathematics and Mechanics,Moscow State University,Moscow,Russia;3.Institute of Computer Science and Applied Mathematics,Christian-Albrechts University,Kiel,Germany

Abstract:	In many database applications in telecommunication, environmental and health sciences, bioinformatics, physics, and econometrics, real-world data are uncertain and subjected to errors. These data are processed, transmitted and stored in large databases. We consider stochastic modelling for databases with uncertain data and for some basic database operations (for example, join, selection) with exact and approximate matching. Approximate join is used for merging or data deduplication in large databases. Distribution and mean of the join sizes are studied for random databases. A random database is treated as a table with independent random records with a common distribution (or a set of random tables). These results can be used for integration of information from different databases, multiple join optimization, and various probabilistic algorithms for structured random data.

Keywords:
本文献已被 SpringerLink 等数据库收录！