CrowdCamp Report: Benchmarking the Crowd

As crowdsourcing evolves, the crowds are evolving too.  Mechanical Turk is a different population than it was a few years ago.  There are different crowds at different times of day.  Different crowds may be better or worse for one application or another — CrowdFlower, MobileWorks, or even a crowd of employees within a company or students within a school.

In particular, how can researchers and developers cooperate to collect aggregate data about system properties (e.g. latency, throughput, noise), demographics (gender, age, socioeconomic level), and human performance (motor, perceptual, attention) for the various crowds that they use?

census_cropped

We started exploring this question in a weekend CrowdCamp hackathon at CSCW 2013.  Some concrete steps and discoveries included:

  • We gathered 25 datasets from a wide variety of experiments on Mechanical Turk by many different researchers, ranging from 2008 to 2013.  We found 30,000 unique workers in our sample, and in the most recent datasets, between 20% and 40% workers who had also contributed to previous datasets.  So at least on MTurk, the crowd is stable enough for benchmarking between researchers to be a viable idea.
  • We prototyped a deployment platform, Census, that injects a small benchmarking task into any researcher’s existing HIT, using only one line of Javascript.  The image above shows an example Census task in action.
  • We trawled the recent research literature for possible benchmarking tasks, including affect detection, image tagging, and word sense disambiguation.

We also discovered that Mechanical Turk worker IDs are not as anonymous as researchers generally assume.  For benchmarking that shares information among researchers, it will be necessary to take additional steps to protect worker privacy while preserving the ability to connect the same workers across studies.

Saeideh Bakshi, Georgia Tech
Michael Bernstein, Stanford University
Jeff Bigham, University of Rochester
Jessica Hullman, University of Michigan
Juho Kim, MIT CSAIL
Walter Lasecki, University of Rochester
Matt Lease, University of Texas Austin
Rob Miller, MIT CSAIL
Tanushree Mitra, Georgia Tech

 

This entry was posted in CrowdCamp 2013 by Rob Miller. Bookmark the permalink.

About Rob Miller

Rob Miller is an associate professor of computer science at MIT, and associate director of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research interests lie at the intersection of programming and human computer interaction: making programming easier for end-users (web end-user programming), making it more productive for professionals (HCI for software developers), and making humans part of the programming system itself (crowd computing and human computation).