a hybrid human-machine approach
输入:图 边---点之间相似的概率(prior belief)
目标:将一部分的边交给人来验证,之后利用equality relation的传递来infer余下的边。
文献题目 | 去谷歌学术搜索 | ||||||||||
Crowdsourcing Algorithms for Entity Resolution | |||||||||||
文献作者 | Norases Vesdapunt;Kedar Bellare;Nilesh Dalvi | ||||||||||
文献发表年限 | 2014 | ||||||||||
文献关键字 | |||||||||||
Entity Resolution;Crowdsoursing; | |||||||||||
摘要描述 | |||||||||||
In this paper, we study a hybrid human-machine approach for solving the problem of Entity Resolution (ER). The goal of ER is to identify all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Our input is a graph over all the records in a database, where each edge has a probability denoting our prior belief (based on Machine Learning models) that the pair of records represented by the given edge are duplicates. Our objective is to resolve all the duplicates by asking hu- mans to verify the equality of a subset of edges, leveraging the transitivity of the equality relation to infer the remaining edges(e.g. a=ccanbeinferredgivena=bandb=c). We consider the problem of designing optimal strategies for ask- ing questions to humans that minimize the expected number of questions asked. Using our theoretical framework, we an- alyze several strategies, and show that a strategy, claimed as “optimal” for this problem in a recent work, can perform arbitrarily bad in theory. We propose alternate strategies with theoretical guarantees. Using both public datasets as well as the production system at Facebook, we show that our techniques are effective in practice. |