Crowdsourcing DBpedia Quality Assessment


Overview

In this work we look into the use of crowdsourcing as a means to handle Linked Data quality problems that are challenging to be solved automatically. We analyzed the most common errors encountered in Linked Data sources and classified them according to the extent to which they are likely to be amenable to a specific crowdsourcing approach. Based on this analysis, we implemented a quality assessment methodology for Linked Data that leverage the wisdom of the crowds in different ways: (i) a contest format targeting an expert crowd of researchers and Linked Data enthusiasts; and (ii) paid microtasks published on Amazon Mechanical Turk. We empirically evaluated the the capacity of crowdsourcing approaches to spot quality issues in DBpedia and investigated how the contributions of the two crowds could be optimally integrated into Linked Data curation processes. The results showed that the two styles of crowdsourcing are complementary, and that crowdsourcing-enabled quality assessment is a promising and affordable way to enhance the quality of Linked Data sets.

Methodology


In this work we applied the crowdsourcing pattern Find-Fix-Verifiy, which originally consists of separating a task into three stages. The Find stage asks the crowd to identify problematic elements within a data source. In the second stage, Fix, the crowd corrects the elements belonging to the outcome of the previous stage. The Verify stage corresponds to a final quality control iteration.
Our approach leverages the expertise of LD experts in a contest to find and classify erroneous triples according to a pre-defined scheme. The outcome of this stage -triples judged as "incorrect"- is then verified by the crowd, which is instructed to assess specific types of errors in the subset of triples.