PlateMate: Crowdsourcing Nutrition Analysis from Food Photographs

What if we could put an automated nutritionist in everyone’s pocket? Before each meal, you’d snap a quick picture of your plate and then dig in. From there, your automated nutritionist would identify the foods, measure the portions, and add up the calories. It might even go further, providing tips like “eat more vegetables at lunch” or “lay off those chocolate chip cookies you have every Tuesday to save 150 calories.” If everyone had access to that much data and advice, we could all eat healthier and worry less about obesity, heart disease, and other serious consequences of careless eating.

The problem with this plan is that nutritionists are too scarce (and too big!) to fit in everyone’s pocket, and computers don’t know what pizza looks like. So in the absence of automated nutritionists, most people don’t track what they eat at all. And those that do often make big mistakes, chronically underestimating how much they eat and what it means for their bodies. In the absence of cheap experts and powerful computer vision, how can we bring accurate food logging to the masses?

To solve this problem, we developed PlateMate, which uses the crowd on Mechanical Turk to identify and measure food from user-submitted plate photographs. In PlateMate, a crowd of amateurs is organized into a structured workflow. Each photograph is processed using up to six different HITs, wherein workers progressively find foods in a photo, identify their component parts, match them with foods in a nutrition database, and estimate portions. This workflow reduces the complex task of measuring food intake—which most amateurs do with high error and bias—into a series of simple, verifiable steps that untrained Turkers can perform efficiently and accurately.

To make workers accurate, it’s natural to focus on the design of a crowdsourcing system. Tasks need to have clear instructions, and HITs need to be “wired” together to produce increasingly sophisticated outputs. However, in developing PlateMate, we found that the system’s implementation was just as important. Here, we want to share some of the key software engineering principles we considered when building our system. Not all of them will be applicable to other projects, but we hope they will inspire others trying to make crowds work more like experts.

  • Decomposition: high-level goals should be broken into low-level operations. PlateMate does this on two levels: analysis is divided into Tag, Identify, and Measure steps, and each of these is further decomposed into a sequence 1-3 HITs that workers complete.
  • Modularity: distinct stages should be independent of each other. Requesters should be able to radically change how the “Tag” step works–by, for example, adding computer vision or introducing new HITs–without any change to “Identify.” Interaction between parts need not be strictly sequential, and should be able to skip or repeat steps and follow different branches of processing.
  • Intuitiveness: specifying a workflow should be easy and natural, based on an abstraction that is familiar and reusable. Requesters should feel like they are providing instructions to a human, not programming a computer. Approving, rejecting, and aggregating work should be simple so that requesters will do it often.

Our goal was to develop an application framework based on these principles. The result was our Management framework, which lets programmers solve problems with crowds by creating a hierarchy of computer “managers” modeled after human organizations. In the real world, expert-level work (like building a table) can be reproduced by teams of less skilled workers who each work on some small part of the process. Those workers are supervised by managers who are not skilled craftsmen themselves, but who know how to assign tasks, route items among workers, and verify quality.

Managers are software agents that run in parallel, each creating and verifying HITs as needed based on incoming inputs. Managers communicate with their supervisors and their employees using asynchronous message passing: managers assign tasks by placing them in inboxes of lower level managers and communicate with their superiors by placing results of completed tasks in their own outboxes. This hierarchical message-passing approach allows programmers to implement workflows by decomposing problems into progressively smaller steps.

Here’s an example of the Tag manager, which routes work between its two employees, Draw and Vote (see hierarchy above).

Much more information is available in our UIST 2011 paper, and we welcome questions and comments!