About Gianluca Demartini

Dr. Gianluca Demartini is a Lecturer in Data Science at the Information School of the University of Sheffield, UK. Previously, he was post-doctoral researcher at the eXascale Infolab at the University of Fribourg, visiting researcher at UC Berkeley, junior researcher at the L3S Research Center, and intern at Yahoo! Research. His research interests include Web Information Retrieval, Semantic Web, and Human Computation. His Ph.D. work focused on Entity Retrieval. He has published more than 50 peer-reviewed scientific publications and given tutorials about Entity Retrieval and Crowdsourcing at research conferences.

WWW 2013: Pick-A-Crowd: Tell Me What You Like, and I’ll Tell You What to Do

A Crowdsourcing Platform for Personalized Human Intelligence Task Assignment Based on Social Networks

The most popular micro-task crowdsourcing platform (Amazon MTurk) uses a pull methodology: Workers select which task they want to perform among the ones available on the market based on their interests, task reward, clarity of instructions, requester reputation, etc.

Allowing workers to pick their preferred tasks on a first-come-first-served basis has many advantages, such as short completion times. On the other hand, this mechanism does not guarantee that the worker who performs the task is the best fit: More suitable workers may be available in the crowd but they might be unable to pick the HIT if they were not quick enough.

Instead, we propose a push methodology to crowdsourcing micro-tasks: Our system first creates worker profiles to model their interests and skills and then uses such profiles to assign available tasks to workers.

The proposed approach does result in a significant improvement in the quality of the work done as compared to the same tasks run with anonymous workers from MTurk both for multiple choices tasks as well as for open questions. This is the case since workers can complete tasks on topics of their interest rather than competing with other workers to get the best tasks first.

We have developed a Facebook App named OpenTurk which lets crowd workers log in with their Facebook account. Next, our system collects information about the liked Facebook pages and, based on this, selects which tasks to assign to the workers.

SocialBrainrOverview2

In the paper, we experimentally compare the effectiveness of different task assignment approaches including simple Facebook category matches, expert finding techniques, and semantic based methods.

votingModel

The figure above explains our best performing approach which is based on expert finding:

  • First, an inverted index over the content of Facebook pages liked by the crowd is built.
  • Then, the HITs metadata is used to construct a query over the index and to find Facebook pages relevant to the task.
  • Finally, pages are seen as votes for the expertise of workers for the specific task: Top-k ranked workers are assigned the task.

workerLikes

The above figure presents worker accuracy based the number of “likes” matching the current task over different type of tasks. We can see that the accuracy may vary a lot when not enough relevant “likes” are present in the worker profile (left part). On the other hand, when workers like many Facebook pages relevant to the task, then their performance is always highly accurate (right part).

For more, see our full paper, “Pick-A-Crowd: Tell Me What You Like, and I’ll Tell You What to Do.

People involved:
Djellel E. Difallah, eXascale Infolab, U. of Fribourg, Switzerland.
Gianluca Demartini, eXascale Infolab, U. of Fribourg, Switzerland.
Victor Felder, U. of Fribourg, Switzerland.
Philippe Cudré-Mauroux, eXascale Infolab, U. of Fribourg, Switzerland.

CIDR 2013: CrowdQ – Crowdsourced Query Understanding

Understanding complex questions is characteristic of human intelligence. Quora.com and other Question-and-Answer platforms are good examples of how complex questions are best answered by humans.  Unfortunately, Google and other search engines don’t understand your queries.  In this work we use crowdsourcing combined with algorithms for complex query understanding.

Our proposed system can answer complex queries such as “birthdate of the mayors of all the cities in Italy” The answers for such complex queries are typically available on the Web (or even just in Wikipedia). However, current search engines are not able to provide answers directly because they do not understand the semantics behind user requests.

The proposed system generates query templates using a combination of:

  • query log mining
  • natural language processing (NLP)
    • part-of-speech tagging
    • entity extraction
  • crowdsourcing

Query templates that can then be used to answer whole classes of different questions rather than focusing on just a specific question and answer.

Our proposed approach first transforms the user request into a structured query then  answers the query with machine-readable data publicly available on the Web (i.e., Linked Open Data).

Human input is used to to detect the structure of a user request expressed in natural language:

  • which entities are mentioned
  • which relations exist among the entities
  • what is the type of the desired answer

The crowd is also involved to verify the correctness of automatic annotations in uncertain cases.

The result of this process is an SQL-like query that can be answered automatically by standard database technologies.

For more, see our full paper, CrowdQ – Crowdsourced Query Understanding.

Gianluca Demartini, eXascale Infolab, University of Fribourg, Switzerland
Beth Trushkowsky, AMPLab, UC Berkeley, USA
Tim Kraska, Brown University, USA
Michael J. FranklinAMPLab, UC Berkeley, USA
Daniel Bruckner, UC Berkeley
Daniel Haas, UC Berkeley
Jonathan Harper, UC Berkeley