首页 | 本学科首页   官方微博 | 高级检索  
     检索      


An integer linear programming approach for approximate string comparison
Authors:Marcus Ritt  Alysson M Costa  Sergio Mergen  Viviane M Orengo
Institution:1. Instituto de Informática, Universidade Federal do Rio Grande do Sul, UFRGS, Brazil;2. Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, USP, 13560-970 Sao Carlos, Sao Paulo, Brazil
Abstract:We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance.
Keywords:Approximate string matching  Distance metric  Block edits  Block moves  Integer programming  Hop constraints
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号