Reasoning about Quality in Crowdsourced Enumeration Queries

Human perception and experience are powerful tools for collecting data to answer user queries, one of the features of hybrid human/machine database systems like CrowdDB. Consider queries asking for lists of items like “indoor plants that tolerate low light conditions” or “restaurants in San Francisco serving scallops”. The items in the list could be spread over the web and/or may require human interpretation to find.

When we ask members of the crowd to provide individual items in the list, however, we are faced with questions regarding the quality of the answer set, including:

  • Is the set complete?
  • If not, how much progress have we made so far?
  • How hard would it be to get another new answer?
As crowd workers supply answers one by one, the arrival of new unique answers is rapid at first but then plateaus. This accumulation curve provides insight into reasoning about answer set completeness (in this example, workers are giving items from a set of size 50).

As crowd workers supply answers one by one, the arrival of new unique answers is rapid at first but then plateaus. This accumulation curve provides insight into reasoning about answer set completeness (in this example, workers are giving items from a set of size 50).

The key idea of our technique is to use the arrival rate of new answers from the crowd to reason about the completeness of the set, adapting techniques used by biologists for species estimation.

Imagine you were trying to determine the number of unique species of animals on an island by repeatedly putting out traps overnight and counting which animals were caught (then releasing them). Species estimation algorithms infer the total number of species using the rate at which new species are identified.

The sequence of answers the crowd provides is analogous to these observations of animals; we can estimate the total number of expected answers based on the counts of answers we have seen so far.

However, we discovered that the way in which workers provide their answers is different than how species observations are made. Namely:

  • Individual workers do not give the same answer multiple times
  • Some workers provide more answers than others
  • Workers employ different techniques to find answers, which informs the order in which they provide them

In our work, we characterize the effect of these worker behaviors and devise a technique to reduce their impact.

For more, see our full paper, Crowdsourced Enumeration Queries.

Beth Trushkowsky, AMPLab, UC Berkeley
Tim Kraska, Brown University
Mike Franklin, AMPLab, UC Berkeley
Purna Sarkar, AMPLab, UC Berkeley

2 thoughts on “Reasoning about Quality in Crowdsourced Enumeration Queries

  1. Thanks Beth, I really like this work! I particularly like your observation that workers have different ways of coming up with their private list of answers — sometimes alphabetical, sometimes by mental availability, etc. But many of these distributions are still strongly skewed. If I asked for “vampire books,” I bet I would get an awful lot of Twilight.

    What if a requester was unhappy about getting duplicate answers from workers, and decided to put a feedback loop into the data collection — making an answer like Twilight “taboo” once a certain number of workers had provided it? Is that a good idea? How would that affect your model for predicting the size of the set?

  2. Thanks! The research questions regarding the characteristics of crowd production are very interesting to me. It reminds me studies on group brainstorming. The way the crowd generates answers seem to resemble the approach of parallel or nominal brainstorming , where individuals generate ideas on a question individually without social interaction, and their ideas are pooled afterward. Because there’s no communication among individuals, the same idea can be mentioned multiple times by different people, while the pattern is unlikely to occur in normal, interactive brainstorming, as people tend to remember what have been said during the session.

    It seems that there’s some commonality between nominal brainstorming and patterns of individual and crowd behaviors identified here (e.g., individual workers do not give the same answer multiple times). Perhaps the connection can be an interesting space to explore.

Comments are closed.