Attacking Crowd-Powered Systems with Crowds

Image

Do you trust your crowd?

Crowd-powered systems are being used for more and more complex tasks these days, but not much is known about the potential risks originating from workers themselves.

Types of Threats

Threats come in all shapes and sizes. A single worker can effectively extract information from a single task, but might have a hard time targeting vulnerable tasks from systems that only periodically includes sensitive information. Individual workers also are usually ineffective at disrupting the final output of systems that combine input from multiple workers. However, groups are able to attack these systems as well as more successfully extract even sparse pieces of sensitive information or reconstruct content that was divided to help improve privacy.

Raising an Army

But can attackers realistically be expected to gather large groups of people to attack these systems? Could they use the crowd itself to boost their numbers? Would crowd workers help a requester do something like steal a user’s credit card for an extra $0.05?

To find out, we ran two sets of experiments using workers from Mechanical Turk. For both, we pretended to be two different requesters: one [potentially] malicious requester (who posted an “Attack” task), and one requester being attacked (who posted a “Target” task). Workers started at the Attack task, were shown a set of instructions, and then asked to continue on to the Target task.

Information Extraction

One way the crowd can attack a system is by collecting private information from a task. This is especially of concern as systems that leverage the crowd for tasks, such as supporting assistive technology that captions a user’s conversation [2], or answers visual questions [1,3], make it possible to access personal information (e.g., a credit card number accidentally captured in an image). To simulate this, our task asked Target task workers to copy all text out of an image they were shown (Fig. 1).

Figure showing the Attack task leading to the Target task, and then returning to the Attack task.

The Attack task asks workers are asked to go to a Target task and return with information.

As a baseline, the Attack task asked workers to complete the Target task without returning any information. We then showed workers an image of a credit card drawing which clearly contained no real personal information (the “Innocent” condition), and contrasted the response rates we saw with the case where the Target task contained an image of a real-looking credit card (the “Malicious” condition). Despite containing the same amount of information, we saw a significant drop in response rate when the task looked more potentially harmful (Fig. 2).

Baseline: 73.8%; Innocent: 62.1%; Malicious: 32.8%

Results for the Information Extraction tests.

Information Manipulation

Another way workers can attack a system is to manipulate the answer that is provided to the user. We again recruited workers to attack a system, but this time, the Attack task provided workers with an answer to provide the target task (Figure 3). Our Target task asked workers to transcribe hand-written text they saw in an image.

Figure showing the Attack task leading to the Target task.

The Attack task asks workers are asked to go to a Target task and enter specific information.

As a baseline, we asked workers to complete the Target task with no special instructions. We then ask workers to provide a specific plausible answer given the image (the “Innocent” case), and compared the answers we received with those we got when the workers were asked to give a clearly wrong answer. We again saw a significant drop in the number of workers who were willing to complete the Attack task as instructed (Fig. 4).

Baseline: 73.8%; Innocent: 75.0%; Malicious: 27.9%

Results for the Information Manipulation tests.

Future Work

Now the question is, how to we avoid these attacks? Future approaches can leverage the fact that hired groups of workers appear to contain some people who are cognizant of when tasks contain potentially harmful information in order to protect against other workers who don’t notice the risk or will complete the tasks regardless – an alarming ~30% of workers.

References

[1] J.P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R.C. Miller, R. Miller, A. Tatrowicz, B. White, S. White, T. Yeh. VizWiz: Nearly Real-time Answers to Visual Questions. UIST 2010.
[2] W.S. Lasecki, C.D. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P. Bigham. Real-time Captioning by Groups of Non-Experts. UIST 2014.
[3] W.S. Lasecki, P. Thiha, Y. Zhong, E. Brady, J.P. Bigham. Answering Visual Questions with Conversational Crowd Assistants. ASSETS 2013.

 

Full paper: Information Extraction and Manipulation Threats in Crowd-Powered Systems.
Walter S. Lasecki, University of Rochester / Carnegie Mellon University
Jaime Teevan, Microsoft Research
Ece Kamar, Microsoft Research

alt.projects and emergent needs in mature open collaborations

The ongoing story of how the world’s largest encyclopedia gets written comprises several distinct historical eras. An initial linear growth phase, followed by an era of rapid exponential growth, and over the past 7 years a maturation phase characterized by slower growth in article creation and a gradual decline in regular participation among the core community of Wikipedia editors.

Crowd researchers have learned a lot about collaboration from studying Wikipedia during the “peak editing” era. Peak editing (after like peak oil) roughly comprises the years 2006 – 2008 when Wikipedia’s increasing popularity created a huge demand for new content, and there was plenty of encyclopedia work to go around.

Now that Wikipedia is a mature collaboration, does it still have anything new to teach us?

One key to Wikipedias success during this period were WikiProjects, collaborative workspaces (and the teams of workers that inhabit them), focused on coordinating particular kinds of work. Traditionally, that work of WikiProjects has involved editing articles within a particular topic, like Feminism or Military History.

Graph showing the number of editors participating in WikiProjects over time.

Conventional Wikipedia WikiProjects focus on encyclopedia topics ranging from Medicine to Public Art.

Continue reading

AskSheet: Efficient Human Computation for Decision Making with Spreadsheets

For some decisions, we know what we want; the real “work” is in digging through the wealth of available information to find one that meets our criteria. The process can be time-consuming, especially if there are many alternatives to choose from, with the details spread among different locations.

One of the recurring challenges of adapting any complex job to a microtask platform is that crowd workers can’t see the big picture. They don’t know your situation. Furthermore, knowledge gained in one task doesn’t necessarily help a worker doing the next task. For decision making, this makes it difficult to pare down the options based on just a few of the most influential criteria.

AskSheet is a system for coordinating workers on Mechanical Turk to gather the inputs to data-driven decisions. The user (someone in charge of a decision) creates a skeleton spreadsheet model, including spreadsheet formulas that would compute the decision result if all of the inputs were already known. Cells in need of input are marked by entering a special =ASK(…) formula, the parameters to which specify the type and usually the range of information requested, as well as cues that help AskSheet group related inputs into HITs that will be efficient for workers.

This decision model finds any pediatrician who (1) has good ratings on two rating sites, (2) is within 15 minutes’ drive, and (3) accepts my insurance. Once the “root” cell (F53) can be evaluated, we know that one doctor must fit, so AskSheet stops posting HITs.

Continue reading

Posted in -

VidWiki: Can we create and modify videos like a wiki?

Iterative improvement of annotations during our user study

Iterative improvement of annotations during our user study

For anyone who has authored or tried to edit a video, you know how complicated the process can be. Re-recording portions of the video or audio, splitting the video at relevant points, and going back to correct even the smallest change are all headaches along the way to creating good, lasting content. Although many internet videos can be one-off recordings, videos for educational content are usually meant to be more polished and intended to be reused many times.

While text-based information platforms like Wikipedia have benefited enormously from crowdsourced contributions, the various limitations of video hinder the collaborative editing and improvement of educational videos. Given the increasing prominence of videos as a way to communicate online, especially in educational videos, we present VidWiki, an online platform that enables students to iteratively improve the presentation quality and content of videos. Through the platform, users can improve the legibility of handwriting, correct errors, or translate text in videos by overlaying typeset content such as text, shapes, equations, or images. To check out VidWiki and see the tool, try it out here!

A screenshot of a video with the handwriting annotated first in English, and then translated to Hindi

A screenshot of a video with the handwriting annotated first in English, and then translated to Hindi

VidWiki represents a first step toward investigating all of the complexities of crowd-contributed video editing. We conducted a small user study in which 13 novice users annotated and revised Khan Academy videos. Our results suggest that with only a small investment of time on the part of viewers, it may be possible to make meaningful improvements in online educational videos.

To check out the tool yourself, try VidWiki to see some sample annotated videos or try editing yourself. For those going to CSCW next week, come check out our talk on Wednesday in the MOOC section at 11:10am!

For more, see our full paper, VidWiki: Enabling the Crowd to Improve the Legibility of Online Educational Videos.

Andrew Cross, Microsoft Research India
Mydhili Bayyapunedi, Microsoft Research India
Dilip Ravindran, Microsoft Research India
Ed Cutrell, Microsoft Research India
Bill Thies, Microsoft Research India

Leaderboards are not only used competitively

Point scoring and leaderboards are one of many techniques used to encourage engagement in crowdsourcing activities. But do they have a motivational effect? How do people actually relate to them? ctd3

We studied the behavior of volunteers collecting data for an environmental organization, Close The Doors. They registered whether shops left their doors open or kept them closed during winter, using a mobile app while going about their everyday lives over a two week period. We compared the performance and attitudes of volunteers who scored points displayed on a leaderboard with those who used a control version of the mobile app – still collecting data, but no performance feedback.

We found that:

  • The top scorers in the points group substantially outperformed the top scorers in the control group.
  • But the lower scorers in the points group performed less well than the lower scorers in the control group.
  • Unless additional payment was used alongside points, there was no statistically significant difference in the data collection performance between those awarded points and the control group.

We conducted interviews with top, medium and low scorers in each group to understand what was happening.

  • The top scorers were motivated by the leaderboard, competing with those close to them and spurring each other on, This resulted in increased performance. So they performed better than the top scorers in the control group.
  • Low scorers were demotivated by the leaderboard, feeling they couldnt catch up and so gave up as the experiment progressed

Our CSCW2014 paper focuses on the attitude of those in the middle. Three of the four mid-scoring interviewees who were interviewed (unlike all but one of the top and low scoring interviewees) did not express competitive attitudes to the leaderboard. Rather, they viewed it as a means of understanding what other volunteers were doing, with the aim of making a typical contribution.

  • They were positively motivated to make a contribution on a par with others. One explicitly said they wanted to be in the middle of the leaderboard.
  • However the score required to be in the middle is determined by the performances of those below, not by those above.
  • So despite the positive motivation, the actual contribution of those in the middle was lower than those in the control group.

So some are motivated pr demotivated by competition, while others are motivated more by playing their part. Corwdsourcing systems could support the latter motivation by using normification in addition to gamification. This is to provide information about the behaviour of others in a way which encourages non-competitive comparison. Perhaps crowdsourcing systems could use adaptive, personalised interfaces to tailor the motivational information they provide based on the psychology of the individual.

For more, see our full paper, Competing or Aiming to be Average? Normification as a means of engaging digital volunteers.

Chris Preist – University of Bristol
Elaine Massung – University of Bristol
David Coyle – University of Bristol

Crowdfunding: A New Way to Involve the Community in Entrepreneurship

Consider the last thing you bought on Amazon. Do you remember the company that made the product? Did you speak with the designer? In our CSCW 2014 paper, Understanding the Role of Community in Crowdfunding, we present the first qualitative study of how crowdfunding provides a new way for entrepreneurs to involve the public in their design process.

Screen Shot 2014-02-09 at 3.28.33 PM
An example crowdfunding project page.

We interviewed 47 crowdfunding entrepreneurs using Kickstarter, Indiegogo, and Rockethub to understand:

  • What is the work of crowdfunding?
  • What role does community play in crowdfunding work?
  • What current technologies support crowdfunding work, and how can they be improved?

Scholars studying entrepreneurship find that less than 30% of traditional entrepreneurs maintain direct or indirect ties with investors or customers. This stands in contrast to crowdfunding entrepreneurs who report maintaining regular and direct contact with their financial supporters during and after their campaign. This includes responding to questions, seeking feedback on prototypes, and posting weekly progress updates.

For example, one book designer described performing live video updates with his supporters on how he did page layout. Another product designer making a lightweight snowshoe had his supporters vote on what color to make the shoe straps.

Overall, we identified five types of crowdfunding work and the role of community in each:
Screen Shot 2014-02-09 at 2.21.21 PM

Perhaps the most exciting type of crowdfunding work in reciprocating resources where experienced crowdfunders not only donate funds to other projects, but also give advice to novices. For instance, a crowdfunding entrepreneur who ran two successful campaigns created his own Pinterest board (see example below) where he posts tips and tricks on how to run a campaign. While another successful crowdfunder says he receives weekly emails from people asking for feedback on their project page.

Screen Shot 2014-02-09 at 3.17.38 PM

While there exist many tools for online collaboration and feedback, such as Amazon Mechanical Turk and oDesk, few crowdfunders use them or know of their existence. This suggests design opportunities to create more crowdfunder-friendly support tools to help them perform their work. We are currently designing tools to help crowdfunders seek feedback online from crowd workers and better understand and leverage their social networks for publicity.

For more information on the role of community in crowdfunding, you can download our full paper here.

Julie Hui, Northwestern University
Michael Greenberg, Northwestern University
Elizabeth Gerber, Northwestern University

 

 

Remote Shopping Advice: Crowdsourcing In-Store Purchase Decisions

Recent Pew reports, as well as our own survey, have found that consumers shopping in brick-and-mortar stores are increasingly using their mobile phones to contact others while they shop. The increasing capabilities of smartphones, combined with the emergence of powerful social platforms like social networking sites and crowd labor marketplaces, offer new opportunities for turning solitary in-store shopping into a rich social experience.We conducted a study to explore the potential of friendsourcing and paid crowdsourcing to enhance in-store shopping. Participants selected and tried on three outfits at a Seattle-area Eddie Bauer store; we created a single, composite image showing the three potential purchases side-by-side. Participants then posted the image to Facebook, asking their friends for feedback on which outfit to purchase; we also posted the image to Amazon’s Mechanical Turk service, and asked up to 20 U.S.-based Turkers to identify their favorite outfit, provide comments explaining their choice, and provide basic demographic information (gender, age).

Study participants posted composite photos showing their three purchase possibilities; these photos were the posted to Facebook and Mechanical Turk to crowdsource the shopping decision.

Study participants posted composite photos showing their three purchase possibilities; these photos were the posted to Facebook and Mechanical Turk to crowdsource the shopping decision.

Although none of our participants had used paid crowdsourcing before, and all were doubtful that it would be useful to them when we described what we planned to do at the start of the study session, the shopping feedback provided by paid crowd workers turned out to be surprisingly compelling to participants – more so than the friendsourced feedback from Facebook, in part because the crowd workers were more honest, explaining not only what looked good, but also what looked bad, and why! They also enjoyed the ability to see how opinions varied among different demographic groups (e.g., did male raters prefer a different outfit than female raters?).

Although Mechanical Turk had a speed advantage over Facebook, both sources generally provided multiple responses within a few minutes – fast enough that a shopper could get real-time decision-support information from the crowd while still in the store.

Our CSCW 2014 paper on “Remote Shopping Advice” describes our study in more detail, as well as how our findings can be applied toward designing next-generation social shopping experiences.

For more, see our full paper, Remote Shopping Advice: Enhancing In-Store Shopping with Social Technologies.

Meredith Ringel Morris, Microsoft Research
Kori Inkpen, Microsoft Research
Gina Venolia, Microsoft Research

Social Media Use by Mothers of Young Children

In our upcoming CSCW 2014 paper, we present the first formal study of how mothers of young children use social media, by analyzing surveys and social media feeds provided by several hundred mothers of infants and toddlers in the U.S.

Mothers overwhelmingly did not use Twitter for sharing information about their children, but nearly all of them used Facebook; for example, 96% reported having posted photos of their child on Facebook.

Our findings indicate several common trends in the way mothers use Facebook. Notably, the frequency of posting status updates falls by more than half after the birth of their child, and does not appear to rebound in the first few years of parenthood. However, the rate of photo-posting holds steady at pre-birth levels, meaning that photos comprise a relatively larger portion of posts than prior to the birth.

Contrary to popular belief (as exemplified by apps like unbaby.me that remove a perceived overabundance of baby photos from one’s Newsfeed), mothers do not appear to post exclusively about their offspring (about 15% of posts for first-time moms, and 11% for subsequent births). The first month post-birth contains the most baby-related postings, which then drop off.

After a baby is one month old, the percentage of a mother's posts that mention the child drop off.

After a baby is one month old, the percentage of a mother’s posts that mention the child drop off.

However, posts containing the child’s name receive far more likes and comments than other status updates; this likely gives them prominence in Facebook’s Newsfeed, likely reinforcing the image that mothers’ statuses are overly child-centric.

For more details about how new mothers use social media, including special groups such as those diagnosed with postpartum depression and those whose children have developmental delays, and discussion of how these findings can be used to design social networks and apps that support new moms’ needs, you can download our full paper,  Social Networking Site Use by Mothers of Young Children.
Meredith Ringel Morris, Microsoft Research

Posted in -

Being A Turker

‘Turking’, i.e. crowdsourced work done using Amazon Mechanical Turk (AMT) is attracting a lot of attention. In many ways it is a ‘black box. Amazon is not transparent about how the marketplace functions, what rules govern it, and who the requesters and Turkers – who post and carry out the human intelligence tasks (HITs) – are.

Research has looked to prise open the black box, understand how it operates and use it to get the best results. It is generally considered a great opportunity for getting micro-task work completed at very cheap rates, quickly. There are concerns about AMT as a grey market; some requesters and Turkers are unscrupulous. The question for requesters has been how to design and control the crowd to get genuine work done.

Research on the Turkers themselves has been rather scant, with notable exceptions where people have contacted Turkers, often through AMT itself, done interviews, questionnaires and HITs to express their thoughts and feelings. Who they are and what they think is still unclear. What is myth or truth? We tried to better understand these invisible workers by joining their forum, Turker Nation, and looking in detail at what they discussed amongst themselves.

This is what we found:

  1. Members see Turking as work and are primarily motivated by earning.
  2. Earnings vary but Turking is low wage work: high earners on Turker Nation make ~$15-16k/yr
  3. Workers aspire to earn at least $7-10/hr, but (newbies especially) do lower paid HITs to increase their reputation and HIT count.
  4. Many Turkers choose AMT because they cannot find a good ‘regular’ job or need other income. Some are housebound, others are in circumstances where Turking is one of the few options to earn.
  5. Turker Nation provides information and support on tools, techniques, tricks of the trade, earning, and learning. They mostly share information about good and bad HITs and requesters.
  6. Relationships are key: Turkers like anonymity and flexibility but want decent working relationships with courteous communication. They want fair pay for fair work (decent wages, fairness in judging work, timely payment…) and respect works both ways: good requesters are prized.
  7. Members mostly behave ethically. Ripping requesters off is not endorsed and is justified only against dubious requesters. There is a moral duty to their fellow members.
  8. Members feel that by sharing information and acting cooperatively they can have a stronger effect on regulating the market. Many are skeptical about government intervention.

For more, see our full paper, Being A Turker, which will appear in CSCW 2014

David Martin, Ben Hanrahan, Xerox Research Centre Europe, Jacki O’Neill, Microsoft Research India, Neha Gupta, Nottingham University

Posted in -

CrowdCamp Report: The Microtask Liberation Front: Worker-friendly Task Recommendation

As crowdsourced marketplaces like Amazon’s Mechanical Turk have grown, tool builders have focused the majority of their attention on requesters.  The research community has produced methods for improving result quality, weeding out low-quality work, and optimizing crowd-powered workflows, all geared toward helping requesters.  On the other hand, the community has done a decent job of studying crowd workers, but has not devoted much effort to building usable tools that improve the lives of workers.  At CrowdCamp, we worked on a browser plugin called MTLF that we hope will improve Turkers’ task-finding and work experiences.

A prototype of the MTLF browser plugin

A prototype of the MTLF browser plugin

After installing MTLF, a Turker logs into MTurk.  Our prototype asks them to prioritize their preferences for income, task diversity, or fun.  After completing a task, they are asked to provide a binary rating (hot/not) of a task.  They are then asked whether they want a new task or more of the same task.  Instead of having the Turker wade through the existing difficult-to-grok list of available tasks, MTLF automatically pops up a new task on the Turker’s screen.  As Turkers change their priorities and grade tasks, MTLF’s recommendation algorithm leverages the joint work histories of many workers to identify tasks that match individual worker interests and preferences.  The goal of our tool is to improve worker satisfaction and reduce worker search time and frustration.

We’re not the first to take on the challenge of improving the lives of workers.  Turkopticon is a wonderful tool for Turkers to share information on requesters.  Turkers themselves have identified a number of other tools to help them with their process.  None of these tools, however, optimize crowd workers’ preferences in quite the automated way that requester-oriented tools currently do.  As we build on our prototype, we hope to ingest information from sources like Turkopticon to inform our recommendation algorithms.

While our prototype has a working interface and backend to store user preferences, we’re working hard on more features for a usable first version.  Our next steps include exploring sources of data other than worker preferences, building an initial task recommender, and co-designing and iterating on our initial interface with the help of Turkers.  We’d love your help—our github repository has a list of open needs that you can help out with!

Jonathan Bragg, University of Washington
Lydia Chilton, University of Washington
Daniel Haas, UC Berkeley
Rafael Leano, University of Nebraska-Lincoln
Adam Marcus, Locu/GoDaddy
Jeff Rzeszotarski, Carnegie Mellon University

Posted in -