排序方式: 共有1条查询结果,搜索用时 0 毫秒
1
1.
To detect database records containing approximate and exact duplicates because of data entry error or differences in the detailed
schemas of records from multiple databases or for some other reasons is an important line of research. Yet no comprehensive
comparative study has been performed to evaluate the effectiveness of Silhouette width, Calinski & Harbasz index (pseudo F-statistics)
and Baker & Hubert index (γ index) algorithms for exact and approximate duplicates. In this paper, a comparative study and effectiveness of these three
cluster validation techniques which involve measuring the stability of a partition in a data set in the presence of noise,
in particular, approximate and exact duplicates are presented. Silhouette width, Calinski & Harbasz index and Baker & Hubert
index are calculated before and after inserting the exact and approximate duplicates (deliberately) in the data set. Comprehensive
experiments on glass, wine, iris and ruspini database confirms that the Baker & Hubert index is not stable in the presence
of approximate duplicates. Moreover, Silhouette width, Calinski and Harbasz index and Baker & Hubert indice do not exceed
the original data indice in the presence of approximate duplicates. 相似文献
1