Paying human computers by the bit

Collective human computation – presenting objective questions to multiple humans and collecting their judgments – is a powerful and increasingly popular paradigm for performing computational tasks beyond the reach of today’s algorithms. From image classification to data validation, the human computer is making a comeback.

But how should we measure the performance of a human doing a computational task? Speed without accuracy is worthless, and accuracy itself is hard to measure in classification or estimation tasks in which a close-to-correct judgment still has value.

I assert that the value of a judgment is the amount by which it reduces the surprise of learning the correct answer to a question. This is a basic concept in information theory: the pointwise mutual information between the judgment and the answer.

For example, a classification problem with four equally-likely categories has entropy of 2 bits per question. If you correctly classify a series of objects, you’re giving the full 2 bits of information for each. If you’re a spammer giving judgments that are statistically independent of the correct categories, you’re giving zero information no matter what your spamming strategy is.

Thus, the net value of a contributor’s judgments is the total amount of information they give us, a well-defined extensive quantity that we can measure in bits (or nats or digits, if you please).

This metric has the advantages of being naturally-motivated, task- and model-agnostic, and free of tuning, and it easily plugs in to any resolution algorithm that models contributors and answers as random variables.

Expected values (ie. entropy) can be used to predict a contributor’s performance on a given question, conditioned on what’s already known about that question. Contributors can be preferentially given the questions for which they’re likely to be most informative. By applying this technique to data from Galaxy Zoo 2 (a crowdsourced deep-field galaxy classification project, part of the Zooniverse program), I was able to demonstrate a substantial improvement in accuracy compared to random assignment of questions to contributors.

Finally, we can measure the cost-effectiveness of the judgment collection process or the information efficiency of the resolution algorithm in terms of the total information received from contributors. Related metrics can be used to measure the overlap in information between two contributors or the information wasted by collecting redundant judgments.

The metrics I present can be mixed in to any human computation resolution algorithm that uses a statistical model to turn judgments into answers, by using the model’s estimated parameters to compute a set of conditional probabilities and then plugging these in to the definitions of the information-theoretic quantities. The paper includes worked examples for several models.

For more, see the full paper:
Pay by the Bit: An Information-Theoretic Metric for Collective Human Judgment

Tamsyn P Waterhouse, Google Inc.

3 thoughts on “Paying human computers by the bit

  1. I like how this approach is completely agnostic of task so long as the submissions are independent. It seems like it could be crafted into fair compensation systems that actually reward based on the value a person provides to a crowd organizer. I’d love to see this sort of information surfaced to the workers themselves. Show workers that they are improving a project by a quantifiable amount through their hard work.

    I wonder, though, how one might deal with collusion. We already see on places like MTurk that workers can collude to break quality control systems even if they are distributed and lack a central organizer. Since this approach depends on independent judgments by workers, I wonder how we might adapt it to be more tolerant to interdependence. What sorts of approaches might work?

    • Thanks Jeff!

      If you had a resolution algorithm that could model violation of conditional independence, then you could measure that violation in information-theoretic terms as I(J;J’|A); that is, the mutual information of two contributors conditioned on the answer.

      However, most models assume conditional independence explicitly, and so this quantity is by definition 0 because p_j|a,j’=p_j|a for all a, j, and j’ for such models.

      Unfortunately, I think that collusion is a problem that must be dealt with at a higher level.

Comments are closed.