Can we achieve reliable inference using unreliable crowd workers?

Let us assume a set of N crowd workers are given the task of classifying a given dog image into a set of M possible breeds. Since workers may not be canine experts, they may not be able to directly classify and so we should ask simpler questions. There are two basic properties of crowd workers which cause a degraded performance of crowdsourcing systems:

  • Lack of domain expertise (which may necessitate asking binary questions rather than asking for fine classification), and
  • Unreliability (which may necessitate intelligently deployed redundancy)

The above problems can be handled by the use of error-correcting codes. Using code matrices, we can design binary questions for crowd workers that allow the task manager to reliably infer correct classification even with unreliable workers.

Untitled

The performance of a classification task is heavily dependent on the design of these simple binary questions. The question design problem is equivalent to the design of an M x N binary code matrix A={ali}. The rows correspond to the different classes while a column ai corresponds to the question to the ith worker. As an example, consider the task of classifying a dog image into one of four breeds: Pekingese, Mastiff, Maltese, or Saluki. The binary question of whether a dog has a snub nose or a long nose differentiates between {Pekingese, Mastiff} and {Maltese, Saluki}, whereas the binary question of whether the dog is small or large differentiates between {Pekingese, Maltese} and {Mastiff, Saluki}.

ex

An illustrative example is shown in the figure above for the dog breed classification task. Let the columns corresponding to the ith and jth workers be ai = [1010]‘ and aj = [1100]‘ respectively. The ith worker is asked: “Is the dog small or large?” since she is to differentiate between the first (Pekingese) or third (Maltese) breed and the others. The jth worker is asked: “Does the dog have a snub nose or a long nose?” since she is to differentiate between the first two breeds (Pekingese, Mastiff) and the others. These questions are designed from the codematrix using taxonomy of dog breeds. The task manager makes the final classification decision as the hypothesis corresponding to the code word (row) that is closest in Hamming distance to the received vector of decisions. A good codematrix can be designed using simulated annealing or cyclic column replacement based optimization.

To evaluate the performance of this scheme, the worker’s reliability can be modeled as a random variable: spammer-hammer model or the Beta model. The average probability of misclassification can be derived as a function of the mean (μ) of the workers’ reliability. The proposed scheme’s performance can be compared with the traditional voting based scheme. The summary of the results are:

  • Crowd Ordering: Better crowds yield better performance in terms of average error probability
  • Coding is better than majority vote: Good codes perform better than majority vote as they diversify the binary questions and use human cognitive energy more efficiently
  • Gap in performance generally increases for larger system size

For more, see our ICASSP 2013 paper, Reliable Classification by Unreliable Crowds
Aditya Vempaty, Syracuse University
Lav R. Varshney,  IBM Thomas J. Watson Research Center
Pramod K. Varshney, Syracuse University

This entry was posted in - and tagged , , by Aditya Vempaty. Bookmark the permalink.

About Aditya Vempaty

Aditya Vempaty was born in Hyderabad, India, on August 3, 1989. He received the B.Tech. degree in electrical engineering from the Indian Institute of Technology (IIT), Kanpur, in 2011, with academic excellence awards for consecutive years. Since 2011, he has been working towards the Ph.D. degree in electrical engineering at Syracuse University, Syracuse, NY. He is a Graduate Research Assistant with the Sensor Fusion Laboratory, Syracuse, where he was also an Undergraduate Research Intern during summer 2010. His research interests include reliable signal processing, crowdsourcing, network security, statistical signal processing, and data fusion.