TurKit Retrospective: An Interview with Greg Little

This is the first in a series of retrospectives on crowd-related research.  We ask authors to reflect on what worked, what didn’t, and what they learned that they couldn’t say in the paper.

TurKit was published at UIST 2010. Soylent (best student paper award) and VizWiz (best paper award) were also published there and were both based on TurKit. This is an interview with the first author, Greg Little, conducted by the second author of the TurKit paper, Lydia Chilton.

What is TurKit?
TurKit is a tool for dispatching iterative tasks to Mechanical Turk. For example, if you wanted to iteratively improve an image description  you can dispatch a task to write a description, then dispatch a task to improve it, then dispatch a task to vote on whether to accept the improvement.  TurKit introduces the Crash-and-Rerun programming model.  For example, Crash-and-Rerun will:

  1. Run the first task (“write a description”) until a turker completes it.
  2. Once it completes, the results will be recorded (memoized). That instance of the task will never be dispatched again.
  3. Next, TurKit reruns the entire script.  Since the first task is done and the results memoized, it retrieves the results from the cache and uses them to dispatch the next task (“improve this description”).

The next task is to vote on whether to keep the improvements and to get more improvements, if needed. TurKit reruns until all the tasks are completed. Memoization ensures you don’t spend money twice for the same task.

image_caption_improve

Did people use TurKit?
Yes. We had 50-100 users in total, we think.  They didn’t all use it as expected.  Many people simply used it as a wrapper around MTurk that lets you display external HITs in an iframe.  Oddly, the MTurk tools don’t let you do this yet.  It also allowed people to get their results in JSON and not XML.

What would you do differently?
The Crash-and-Rerun programming model.

What is the problem with Crash-and-Rerun?
Crash-and-Rerun was problematic because it is subtly hard to reason about.  You run the code once, it gets as far as it can get, then crashes, waits a minute and reruns.  This is useful if you want to post a task, then wait for it to be completed. Users have a hard time realizing that the task won’t be posted a second time. Users are too familiar with the idea that every line of code will be re-executed every time it reruns.

We should have given users feedback about what was actually being posted to MTurk and what was being used from the cache of memoized tasks.

Do you still think iterative tasks are important?
Yes, but in a different way than I once did.
Originally, I thought that with enough small tasks and enough people, we could simulate an expert using only novices. However, this would require a deep network of tasks and novices.  I think that’s too hard.  I think you need to have experts in a system.

What type of iterative tasks are most useful?
Successful workflows tend to be shallow.  Soylent has 3 steps: Find, Fix, Verify.  Cascade has 4 steps: Generate, SelectBest, and a two-phase Categorization.  Shallow isn’t a bad thing.  Fewer steps means the workflow is simpler and often more parallelizable.

What was your lasting impression of Iterative Tasks?
We were able to solve ‘many eyes’ problems using iterative tasks, such as decyphering blurry text.
bad_handwriting

This impressed a lot of people.  The image was compelling. It was compelling because it seems so hard even for a human to do, whereas image labeling doesn’t seem hard for a human to do.  Unfortunately, in the real world, we didn’t find  problems that were hard for individuals but possible for crowds.  Many crowd problems were at about the level of labeling an image or correcting a typo or transcribing some text.

What has persisted from TurKit?
I think the idea of making complex things from multiple people in a coordinated way is powerful.  TurKit was an early embodiment of that idea.  Others have had it independently as well.  Niki Kittur has talked about it in the Future of Crowd Work.  However, I don’t think it’s feasible to simulate an expert with novices. Many of us have come to the conclusion that expertise is necessary.

Read the original paper, TurKit: Human Computation Algorithms
on Mechanical Turk
.

Greg Little, oDesk
Lydia B. Chilton, University of Washington
Max Goldman, MIT
Robert C. Miller, MIT

3 thoughts on “TurKit Retrospective: An Interview with Greg Little

    • Good old fashioned guessing. There are the people we know who used it (because we set it up for them) and people we know who used it because they emailed us questions about it. There were a couple stackover questions about it. When the MTurk API changes, TurKit breaks, and then we get questions sometimes :)

Comments are closed.