Let us assume a set of *N* crowd workers are given the task of classifying a given dog image into a set of *M* possible breeds. Since workers may not be canine experts, they may not be able to directly classify and so we should ask simpler questions. There are two basic properties of crowd workers which cause a degraded performance of crowdsourcing systems:

- Lack of domain expertise (which may necessitate asking binary questions rather than asking for fine classification), and
- Unreliability (which may necessitate intelligently deployed redundancy)

The above problems can be handled by the use of **error-correcting codes**. **Using code matrices, we can design binary questions for crowd workers that allow the task manager to reliably infer correct classification even with unreliable workers.**

The performance of a classification task is heavily dependent on the **design of these simple binary questions**. The question design problem is equivalent to the design of an *M *x* N* binary code matrix **A={a _{li}}**. The rows correspond to the different classes while a column a

_{i}corresponds to the question to the i

^{th}worker. As an example, consider the task of classifying a dog image into one of

**four breeds: Pekingese, Mastiff, Maltese, or Saluki**. The binary question of whether a dog has a snub nose or a long nose differentiates between {Pekingese, Mastiff} and {Maltese, Saluki}, whereas the binary question of whether the dog is small or large differentiates between {Pekingese, Maltese} and {Mastiff, Saluki}.

An illustrative example is shown in the figure above for the dog breed classification task. Let the columns corresponding to the i^{th} and j^{th} workers be a_{i} = [1010]‘ and a_{j} = [1100]‘ respectively. The i^{th} worker is asked: “**Is the dog small or large?**” since she is to differentiate between the first (Pekingese) or third (Maltese) breed and the others. The j^{th} worker is asked: “**Does the dog have a snub nose or a long nose?**” since she is to differentiate between the first two breeds (Pekingese, Mastiff) and the others. These questions are designed from the codematrix using taxonomy of dog breeds. The task manager makes the final classification decision as the hypothesis corresponding to the code word (row) that is closest in Hamming distance to the received vector of decisions. A good codematrix can be designed using simulated annealing or cyclic column replacement based optimization.

To evaluate the performance of this scheme, the worker’s reliability can be modeled as a random variable: spammer-hammer model or the Beta model. The average probability of misclassification can be derived as a function of the mean (μ) of the workers’ reliability. The proposed scheme’s performance can be compared with the traditional voting based scheme. The summary of the results are:

**Crowd Ordering: Better crowds yield better performance**in terms of average error probability**Coding is better than majority vote**: Good codes perform better than majority vote as they diversify the binary questions and**use human cognitive energy more efficiently**- Gap in performance generally increases for larger system size

*For more, see our ICASSP 2013 paper, Reliable Classification by Unreliable Crowds*

*Aditya Vempaty, Syracuse University*

*Lav R. Varshney, IBM Thomas J. Watson Research Center*

Pramod K. Varshney, Syracuse University

Pramod K. Varshney, Syracuse University