Collaboratively Crowdsourcing Workflows with Turkomatic

Anand Kulkarni, UC Berkeley, MobileWorks
Matthew Can, Stanford
Bjoern Hartmann, UC Berkeley

A central challenge in crowd computing is the workflow design problem: how can we divide a complex job — for instance, editing a paper or writing a computer program — into a sequence of microtasks that can be solved by a pool of crowd workers on the web? Effective workflow design is a difficult process, requiring careful task design, extensive software development, and iterated testing with a live crowd. The complexity of workflow design limits participation in crowdsourcing marketplaces to experts willing to invest substantial time and effort, and limits the kinds of tasks that can be crowdsourced today.

What if we could use the crowd to attack the workflow design problem itself? We present Turkomatic, a tool that allows requesters to collaboratively design and execute workflows in conjunction with the crowd.

Turkomatic accepts a requester’s specification of a task in natural language, then uses workers on Amazon’s Mechanical Turk to determine how to structure workflows to achieve the objective. While workers decompose the task and solve subtasks, the requester is able to monitor and edit the resulting workflows as they are produced. The resulting workflows are executed directly by the crowd and the results returned to the requester.

We induce the crowd to design and execute workflows on our behalf via a meta-workflow called Price-Divide-Solve (PDS). This crowd algorithm asks workers to recursively divide complex tasks into simpler ones until they are appropriately short for the price offered, then to solve them. Other workers are asked to verify the solutions and combine the results into a coherent answer to the original request. Turkomatic obviates the need for requesters to implement software or design tasks because it uses pre-structured task templates to interface with the crowd. PDS is potentially capable of generating workflows for a wide variety of tasks.

The Requester Interface. Left, viewing steps. Right, editing.

Editing workflows.

To allow the requester to give input during the workflow design process, Turkomatic provides an interface for visualizing and editing workflows in real time. Requesters can edit subtask descriptions and solutions, create new subtasks, and delete unwanted subtasks. As an alternate mode of operation, it is possible to seed the system with an initial workflow that can be refined by the crowd, enabling a collaborative design process to take place.

To explore how effectively crowds can be used to support the execution of complex work, we performed two evaluations. To provide a baseline for comparison, we examined how crowds performed in producing and solving workflows without the involvement of the requester – a “fire-and-forget” model of Turkomatic. Once this was established, we looked at how requester collaboration improved the crowd’s performance in task design. We examined a variety of complex tasks, including creating and populating a blog, planning a vacation, writing simple Java programs, web research, and essay writing.

Standard task templates in Turkomatic.

As expected, in most cases, an unsupervised crowd produced unsuitable workflows or unsuccessful results. The most common mode of failure was derailment, a phenomenon that occurred when the PDS algorithm produced unnecessarily complex decompositions and failed to terminate. This occurred when workers in different parts of a workflow failed to understand the initial meaning of a task or to effectively understand the relationship of their step to others. For instance, the itinerary planning task resulted in several visits to the same location, and a task requiring editing. However, even the unsupervised crowd produced high-quality results for tasks that could be answered without substantial decomposition, successfully solving a task to write Java code and to compose an essay.

Results from Turkomatic with no requester involvement.

By comparison, the results from collaboration between requesters and the crowd were substantially better. When requesters used Turkomatic’s workflow editing tools to monitor and guide the crowd’s efforts, tasks of all categories we tested completed successfully by the crowd, and the workflows were usable. This, too, is unsurprising — intervention enabled requesters to provide feedback on their initial intentions and to iterate on unsuccessful tasks, and prevented crowds from operating in a vacuum.

Successfully solved tasks submitted by Turkomatic requesters. At top, restaurant recommendations. At bottom, blog comments.

The price-divide-solve approach represents an effort to produce a generic algorithm for crowdsourcing arbitrary work. This strategy has value in quickly evaluating the ability of crowds to solve particular kinds of work, and it can reduce the complexity of accessing crowd platforms for casual use. However, this one-size-fits-all strategy trades off ease of use for runtime supervision: while workflows can be generated without exhaustive planning, they require requester monitoring at runtime to guarantee quality of results. In future work, we plan to investigate to what extent this supervisory function can again be assigned to crowd workers. In any case, effective workflow design is among the most common problems facing crowdsourcing researchers today. Why not collaborate with workers in solving it?
For more, see our full paper, Collaboratively Crowdsourcing Workflows with Turkomatic.

3 thoughts on “Collaboratively Crowdsourcing Workflows with Turkomatic

  1. I appreciate the big thinking, as well as the honest reporting of what worked well and what didn’t. This is great work.

    You described it as “an effort to produce generic algorithm for crowdsourcing arbitrary work.” My lab-mate (Chang Hu) read this and asked what I thought of that claim. I’d say it’s a fair claim to make. Nevertheless, there are some tasks for which I was thinking those HIT templates might be inconvenient (i.e., gather zip codes of Fortune 500 headquarters) or appropriate (i.e., plant 10 acres of corn). Also, since Turkomatic is primarily suited to monolithic tasks that can be broken down into subtasks, I imagine it wouldn’t be the most efficient way to do a large quantity of small, mostly irreducible tasks (i.e., labeling 100,000 images). Is that a fair characterization?

  2. Thanks for the kind words!

    Early in the process we did solve large sets of small, irreducible tasks like the one you suggested. There’s a natural divide-and-conquer decomposition here, though workers only sometimes found it by themselves — one substep would be to label the first 1000 images, the next would be to label the next 1000 images, and so on. In our WIP at CHI we discussed how we used Turkomatic in this way to solve a high school SAT.

    However, it’s harder than you’d expect to get workers to glom onto the obvious parallel decomposition – unlike the requesters, they weren’t thinking about minimizing the overall runtime! I actually think that very large, monolithic tasks are trickier to decompose, since workers are even less likely to come up with correct decompositions.

    You’re absolutely right that the templates would prove potentially awkward in practice. Even if the crowd did execute Price-Divide-Solve perfectly in practice, we still wouldn’t end up with the best possible crowd application for a task, because the HITs themselves wouldn’t be optimized for the specific tasks.

    To this end, the way that Turkomatic might be used in practice is as a first pass in building a large-scale crowd application, to create and validate a usable workflow with the crowd. Then, once the necessary steps were in place, a developer could invest time in creating dedicated HITs optimized to solve each individual step. In this sense, Turkomatic functions a bit like a prototyping tool for the crowd.

  3. I don’t doubt that you could have workers break up a large set of small, irreducible tasks would be possible. It just seems like a waste of their effort when you could have given them to the system as a set of multiple tasks. However, if the goal is to reduce any task to a single, consistent requester interface, then I can see how it would make sense.

    … In this sense, Turkomatic functions a bit like a prototyping tool for the crowd.

    That characterization rings especially true. There is a need for that. I can see how someone might build upon the plan created by the workers for future, similar problems. At any rate, I think anything that makes online labor a more viable option for those with work to do will be good for workers in the long run.

Comments are closed.