Attacking Crowd-Powered Systems with Crowds

Image

Do you trust your crowd?

Crowd-powered systems are being used for more and more complex tasks these days, but not much is known about the potential risks originating from workers themselves.

Types of Threats

Threats come in all shapes and sizes. A single worker can effectively extract information from a single task, but might have a hard time targeting vulnerable tasks from systems that only periodically includes sensitive information. Individual workers also are usually ineffective at disrupting the final output of systems that combine input from multiple workers. However, groups are able to attack these systems as well as more successfully extract even sparse pieces of sensitive information or reconstruct content that was divided to help improve privacy.

Raising an Army

But can attackers realistically be expected to gather large groups of people to attack these systems? Could they use the crowd itself to boost their numbers? Would crowd workers help a requester do something like steal a user’s credit card for an extra $0.05?

To find out, we ran two sets of experiments using workers from Mechanical Turk. For both, we pretended to be two different requesters: one [potentially] malicious requester (who posted an “Attack” task), and one requester being attacked (who posted a “Target” task). Workers started at the Attack task, were shown a set of instructions, and then asked to continue on to the Target task.

Information Extraction

One way the crowd can attack a system is by collecting private information from a task. This is especially of concern as systems that leverage the crowd for tasks, such as supporting assistive technology that captions a user’s conversation [2], or answers visual questions [1,3], make it possible to access personal information (e.g., a credit card number accidentally captured in an image). To simulate this, our task asked Target task workers to copy all text out of an image they were shown (Fig. 1).

Figure showing the Attack task leading to the Target task, and then returning to the Attack task.

The Attack task asks workers are asked to go to a Target task and return with information.

As a baseline, the Attack task asked workers to complete the Target task without returning any information. We then showed workers an image of a credit card drawing which clearly contained no real personal information (the “Innocent” condition), and contrasted the response rates we saw with the case where the Target task contained an image of a real-looking credit card (the “Malicious” condition). Despite containing the same amount of information, we saw a significant drop in response rate when the task looked more potentially harmful (Fig. 2).

Baseline: 73.8%; Innocent: 62.1%; Malicious: 32.8%

Results for the Information Extraction tests.

Information Manipulation

Another way workers can attack a system is to manipulate the answer that is provided to the user. We again recruited workers to attack a system, but this time, the Attack task provided workers with an answer to provide the target task (Figure 3). Our Target task asked workers to transcribe hand-written text they saw in an image.

Figure showing the Attack task leading to the Target task.

The Attack task asks workers are asked to go to a Target task and enter specific information.

As a baseline, we asked workers to complete the Target task with no special instructions. We then ask workers to provide a specific plausible answer given the image (the “Innocent” case), and compared the answers we received with those we got when the workers were asked to give a clearly wrong answer. We again saw a significant drop in the number of workers who were willing to complete the Attack task as instructed (Fig. 4).

Baseline: 73.8%; Innocent: 75.0%; Malicious: 27.9%

Results for the Information Manipulation tests.

Future Work

Now the question is, how to we avoid these attacks? Future approaches can leverage the fact that hired groups of workers appear to contain some people who are cognizant of when tasks contain potentially harmful information in order to protect against other workers who don’t notice the risk or will complete the tasks regardless – an alarming ~30% of workers.

References

[1] J.P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R.C. Miller, R. Miller, A. Tatrowicz, B. White, S. White, T. Yeh. VizWiz: Nearly Real-time Answers to Visual Questions. UIST 2010.
[2] W.S. Lasecki, C.D. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P. Bigham. Real-time Captioning by Groups of Non-Experts. UIST 2014.
[3] W.S. Lasecki, P. Thiha, Y. Zhong, E. Brady, J.P. Bigham. Answering Visual Questions with Conversational Crowd Assistants. ASSETS 2013.

 

Full paper: Information Extraction and Manipulation Threats in Crowd-Powered Systems.
Walter S. Lasecki, University of Rochester / Carnegie Mellon University
Jaime Teevan, Microsoft Research
Ece Kamar, Microsoft Research

alt.projects and emergent needs in mature open collaborations

The ongoing story of how the world’s largest encyclopedia gets written comprises several distinct historical eras. An initial linear growth phase, followed by an era of rapid exponential growth, and over the past 7 years a maturation phase characterized by slower growth in article creation and a gradual decline in regular participation among the core community of Wikipedia editors.

Crowd researchers have learned a lot about collaboration from studying Wikipedia during the “peak editing” era. Peak editing (after like peak oil) roughly comprises the years 2006 – 2008 when Wikipedia’s increasing popularity created a huge demand for new content, and there was plenty of encyclopedia work to go around.

Now that Wikipedia is a mature collaboration, does it still have anything new to teach us?

One key to Wikipedias success during this period were WikiProjects, collaborative workspaces (and the teams of workers that inhabit them), focused on coordinating particular kinds of work. Traditionally, that work of WikiProjects has involved editing articles within a particular topic, like Feminism or Military History.

Graph showing the number of editors participating in WikiProjects over time.

Conventional Wikipedia WikiProjects focus on encyclopedia topics ranging from Medicine to Public Art.

Continue reading

Coordinating Donors on Crowdfunding Websites

Crowdfunding websites like KickstarterIndieGoGo, and Spot.Us are becoming increasingly common methods for people to raise money for projects. Kickstarter has helped people raise over $800 million already, and Donors Choose has raised more than $90 million for low income school children. People raise money on crowdfunding sites by posting projects. Most projects require more than one donation to meet their goal, and some projects receive donations but never reach their goal.  Donors need to coordinate to make their donations most effective; your money has more effect if you donate to a project you only partially care about but others are donating to, than if you are the only contributor to a project you really care about.

Using an economics-style lab experiment, we simulated two crowdfunding websites: one uses an all-or-nothing approach we call the the return rule: projects that don’t meet their goal return all donations to donors.  The other uses direct donation: projects keep all donations they receive, whether or not they meet their goal.  By comparing donor behavior on these two simulated websites, we can understand how this technical rule affects how donors coordinate.

Crowdfunding Simulation Round Summary

Crowdfunding Simulation Round Summary

We found that:

  • Donors contribute more money to more projects under the return rule
  • Under the return rule, donors didn’t coordinate; they donated primarily based on their individual preferences
  • Overall, both sites funded almost the same number of projects.  Most of the extra donations went to projects that didn’t get funded
  • Donors learned to coordinate on low-risk projects under the direct donation model.

Returning donations to incomplete projects is a mixed blessing: it makes donors contribute more money, but it reduces the coordination needed to fully fund projects and leads to fewer donations actually being used for projects.

This research also demonstrates that the risk associated with failure in an online peer-production project can serve as a useful coordination mechanism. If a project fails, it fails for everyone who has contributed or has an interest in the project, and eliminating this shared risk can remove the incentive for people to collaborate.

For more, see our full paper, Coordinating Donors on Crowdfunding Websites.

Rick Wash, Michigan State University
Jacob Solomon, Michigan State University

AskSheet: Efficient Human Computation for Decision Making with Spreadsheets

For some decisions, we know what we want; the real “work” is in digging through the wealth of available information to find one that meets our criteria. The process can be time-consuming, especially if there are many alternatives to choose from, with the details spread among different locations.

One of the recurring challenges of adapting any complex job to a microtask platform is that crowd workers can’t see the big picture. They don’t know your situation. Furthermore, knowledge gained in one task doesn’t necessarily help a worker doing the next task. For decision making, this makes it difficult to pare down the options based on just a few of the most influential criteria.

AskSheet is a system for coordinating workers on Mechanical Turk to gather the inputs to data-driven decisions. The user (someone in charge of a decision) creates a skeleton spreadsheet model, including spreadsheet formulas that would compute the decision result if all of the inputs were already known. Cells in need of input are marked by entering a special =ASK(…) formula, the parameters to which specify the type and usually the range of information requested, as well as cues that help AskSheet group related inputs into HITs that will be efficient for workers.

This decision model finds any pediatrician who (1) has good ratings on two rating sites, (2) is within 15 minutes’ drive, and (3) accepts my insurance. Once the “root” cell (F53) can be evaluated, we know that one doctor must fit, so AskSheet stops posting HITs.

Continue reading

Posted in -

VidWiki: Can we create and modify videos like a wiki?

Iterative improvement of annotations during our user study

Iterative improvement of annotations during our user study

For anyone who has authored or tried to edit a video, you know how complicated the process can be. Re-recording portions of the video or audio, splitting the video at relevant points, and going back to correct even the smallest change are all headaches along the way to creating good, lasting content. Although many internet videos can be one-off recordings, videos for educational content are usually meant to be more polished and intended to be reused many times.

While text-based information platforms like Wikipedia have benefited enormously from crowdsourced contributions, the various limitations of video hinder the collaborative editing and improvement of educational videos. Given the increasing prominence of videos as a way to communicate online, especially in educational videos, we present VidWiki, an online platform that enables students to iteratively improve the presentation quality and content of videos. Through the platform, users can improve the legibility of handwriting, correct errors, or translate text in videos by overlaying typeset content such as text, shapes, equations, or images. To check out VidWiki and see the tool, try it out here!

A screenshot of a video with the handwriting annotated first in English, and then translated to Hindi

A screenshot of a video with the handwriting annotated first in English, and then translated to Hindi

VidWiki represents a first step toward investigating all of the complexities of crowd-contributed video editing. We conducted a small user study in which 13 novice users annotated and revised Khan Academy videos. Our results suggest that with only a small investment of time on the part of viewers, it may be possible to make meaningful improvements in online educational videos.

To check out the tool yourself, try VidWiki to see some sample annotated videos or try editing yourself. For those going to CSCW next week, come check out our talk on Wednesday in the MOOC section at 11:10am!

For more, see our full paper, VidWiki: Enabling the Crowd to Improve the Legibility of Online Educational Videos.

Andrew Cross, Microsoft Research India
Mydhili Bayyapunedi, Microsoft Research India
Dilip Ravindran, Microsoft Research India
Ed Cutrell, Microsoft Research India
Bill Thies, Microsoft Research India

Leaderboards are not only used competitively

Point scoring and leaderboards are one of many techniques used to encourage engagement in crowdsourcing activities. But do they have a motivational effect? How do people actually relate to them? ctd3

We studied the behavior of volunteers collecting data for an environmental organization, Close The Doors. They registered whether shops left their doors open or kept them closed during winter, using a mobile app while going about their everyday lives over a two week period. We compared the performance and attitudes of volunteers who scored points displayed on a leaderboard with those who used a control version of the mobile app – still collecting data, but no performance feedback.

We found that:

  • The top scorers in the points group substantially outperformed the top scorers in the control group.
  • But the lower scorers in the points group performed less well than the lower scorers in the control group.
  • Unless additional payment was used alongside points, there was no statistically significant difference in the data collection performance between those awarded points and the control group.

We conducted interviews with top, medium and low scorers in each group to understand what was happening.

  • The top scorers were motivated by the leaderboard, competing with those close to them and spurring each other on, This resulted in increased performance. So they performed better than the top scorers in the control group.
  • Low scorers were demotivated by the leaderboard, feeling they couldnt catch up and so gave up as the experiment progressed

Our CSCW2014 paper focuses on the attitude of those in the middle. Three of the four mid-scoring interviewees who were interviewed (unlike all but one of the top and low scoring interviewees) did not express competitive attitudes to the leaderboard. Rather, they viewed it as a means of understanding what other volunteers were doing, with the aim of making a typical contribution.

  • They were positively motivated to make a contribution on a par with others. One explicitly said they wanted to be in the middle of the leaderboard.
  • However the score required to be in the middle is determined by the performances of those below, not by those above.
  • So despite the positive motivation, the actual contribution of those in the middle was lower than those in the control group.

So some are motivated pr demotivated by competition, while others are motivated more by playing their part. Corwdsourcing systems could support the latter motivation by using normification in addition to gamification. This is to provide information about the behaviour of others in a way which encourages non-competitive comparison. Perhaps crowdsourcing systems could use adaptive, personalised interfaces to tailor the motivational information they provide based on the psychology of the individual.

For more, see our full paper, Competing or Aiming to be Average? Normification as a means of engaging digital volunteers.

Chris Preist – University of Bristol
Elaine Massung – University of Bristol
David Coyle – University of Bristol

For On-Demand Workers, It’s All About the Story

From mystery shopping to furniture assembly, apps such as TaskRabbit and Gigwalk leverage the power of distributed, mobile workers who complete physical world tasks instantly and beyond the constraints of traditional office workspaces. We refer to these workers as the “on-demand mobile workforce.” Mobile workforce services allow task requesters to “crowdsource” tasks in the physical world and aim to disrupt the very nature of employment and work (for good and bad; this may be a matter for another post).

Our paper describes an on-demand workforce service categorization based on two dimensions: (1) task location and (2) task complexity (see figure below). Based on marketplace reviews, user testimonies, and informal observations of the services, we placed four main workforce services into the quadrants to exemplify the categorization.

Categorization of on-demand workforce services.

Categorization of on-demand workforce services.

Although a long line of research on incentives and motivations for crowdsourcing exists, especially on platforms like Amazon’s Mechanical Turk, there hasn’t been much work on physical crowdsourcing, despite the recent appearance of many such platforms. We conducted interviews (see the paper here to learn more about the complete methods and findings) of mobile workforce members to learn more about the extrinsic and intrinsic factors that influence the selection and completion of physical world tasks.

To mention a couple of findings, we found certain task characteristics were highly important to workers as they select and accept tasks:

Knowing the person
Because physical world tasks introduce a different set of personal risks compared to virtual world tasks (e.g., physical harm, deception), workers creatively investigated requesters and scrutinized profile photos, email addresses, and task descriptions. Tasks with profile photos helped workers know who to expect on-site and email addresses were used to cross-reference information on social networking sites.

Knowing the “story”
Tasks that listed intended purposes or background stories of the tasks appealed to the mobile workforce. Tasks for an anniversary surprise or to verify the conditions of a grave plot through a photo affected workers’ opinions and influenced future task selections. Workers also appreciated non-financial incentives of unique experiences that occurred as byproducts of task completion (e.g., meeting new people). Tasks with questionable, unethical intentions (e.g., mailing in old phones, posting fake reviews online, writing student papers) were less likely to be fulfilled.

Generally, this study has broader implications for the design of effective, practical, novel and well-reasoned social and technical crowdsourcing applications that organize help and support in the physical world. Particularly, we hope our findings inform future development of mobile workforce services that are not strictly monetary.

Want to learn more? Check out our full paper here at CSCW 2014.

Rannie Teodoro
Pinar Ozturk
Mor Naaman
Winter Mason
Janne Lindqvist

Crowdfunding: A New Way to Involve the Community in Entrepreneurship

Consider the last thing you bought on Amazon. Do you remember the company that made the product? Did you speak with the designer? In our CSCW 2014 paper, Understanding the Role of Community in Crowdfunding, we present the first qualitative study of how crowdfunding provides a new way for entrepreneurs to involve the public in their design process.

Screen Shot 2014-02-09 at 3.28.33 PM
An example crowdfunding project page.

We interviewed 47 crowdfunding entrepreneurs using Kickstarter, Indiegogo, and Rockethub to understand:

  • What is the work of crowdfunding?
  • What role does community play in crowdfunding work?
  • What current technologies support crowdfunding work, and how can they be improved?

Scholars studying entrepreneurship find that less than 30% of traditional entrepreneurs maintain direct or indirect ties with investors or customers. This stands in contrast to crowdfunding entrepreneurs who report maintaining regular and direct contact with their financial supporters during and after their campaign. This includes responding to questions, seeking feedback on prototypes, and posting weekly progress updates.

For example, one book designer described performing live video updates with his supporters on how he did page layout. Another product designer making a lightweight snowshoe had his supporters vote on what color to make the shoe straps.

Overall, we identified five types of crowdfunding work and the role of community in each:
Screen Shot 2014-02-09 at 2.21.21 PM

Perhaps the most exciting type of crowdfunding work in reciprocating resources where experienced crowdfunders not only donate funds to other projects, but also give advice to novices. For instance, a crowdfunding entrepreneur who ran two successful campaigns created his own Pinterest board (see example below) where he posts tips and tricks on how to run a campaign. While another successful crowdfunder says he receives weekly emails from people asking for feedback on their project page.

Screen Shot 2014-02-09 at 3.17.38 PM

While there exist many tools for online collaboration and feedback, such as Amazon Mechanical Turk and oDesk, few crowdfunders use them or know of their existence. This suggests design opportunities to create more crowdfunder-friendly support tools to help them perform their work. We are currently designing tools to help crowdfunders seek feedback online from crowd workers and better understand and leverage their social networks for publicity.

For more information on the role of community in crowdfunding, you can download our full paper here.

Julie Hui, Northwestern University
Michael Greenberg, Northwestern University
Elizabeth Gerber, Northwestern University

 

 

Remote Shopping Advice: Crowdsourcing In-Store Purchase Decisions

Recent Pew reports, as well as our own survey, have found that consumers shopping in brick-and-mortar stores are increasingly using their mobile phones to contact others while they shop. The increasing capabilities of smartphones, combined with the emergence of powerful social platforms like social networking sites and crowd labor marketplaces, offer new opportunities for turning solitary in-store shopping into a rich social experience.We conducted a study to explore the potential of friendsourcing and paid crowdsourcing to enhance in-store shopping. Participants selected and tried on three outfits at a Seattle-area Eddie Bauer store; we created a single, composite image showing the three potential purchases side-by-side. Participants then posted the image to Facebook, asking their friends for feedback on which outfit to purchase; we also posted the image to Amazon’s Mechanical Turk service, and asked up to 20 U.S.-based Turkers to identify their favorite outfit, provide comments explaining their choice, and provide basic demographic information (gender, age).

Study participants posted composite photos showing their three purchase possibilities; these photos were the posted to Facebook and Mechanical Turk to crowdsource the shopping decision.

Study participants posted composite photos showing their three purchase possibilities; these photos were the posted to Facebook and Mechanical Turk to crowdsource the shopping decision.

Although none of our participants had used paid crowdsourcing before, and all were doubtful that it would be useful to them when we described what we planned to do at the start of the study session, the shopping feedback provided by paid crowd workers turned out to be surprisingly compelling to participants – more so than the friendsourced feedback from Facebook, in part because the crowd workers were more honest, explaining not only what looked good, but also what looked bad, and why! They also enjoyed the ability to see how opinions varied among different demographic groups (e.g., did male raters prefer a different outfit than female raters?).

Although Mechanical Turk had a speed advantage over Facebook, both sources generally provided multiple responses within a few minutes – fast enough that a shopper could get real-time decision-support information from the crowd while still in the store.

Our CSCW 2014 paper on “Remote Shopping Advice” describes our study in more detail, as well as how our findings can be applied toward designing next-generation social shopping experiences.

For more, see our full paper, Remote Shopping Advice: Enhancing In-Store Shopping with Social Technologies.

Meredith Ringel Morris, Microsoft Research
Kori Inkpen, Microsoft Research
Gina Venolia, Microsoft Research

Mailing Lists are Dying, Long Live Stack Exchange!

Historically, mailing lists have been the preferred means for coordinating development and user support activities in open source communities. However, with the emergence of social Q&A sites such as the Stack Exchange network (for instance, Stack Overflow), this is beginning to change.

Number of new R questions asked each month on r-help and Stack Exchange

Number of new R questions asked each month on r-help and Stack Exchange

This example shows the evolution of user support activities for R, a popular data analysis software. We see a decrease in mailing list activity on r-help in recent years, and at the same time a burst of R-related Q&A on two Stack Exchange sites (Cross-Validated and Stack Overflow).

 This trend from recent years raises a number of questions:

  • Is this apparent migration trend away from r-help towards Stack Exchange deliberate?

  • What (if anything) catalyzes the migration to Stack Exchange (for example, a richer Web 2.0 platform to manage content collaboratively, a place to showcase one’s expertise more vividly to peers and potential recruiters, or the different gamification features enabling participants to earn reputation points and badges)

  • Do participants active both on the mailing list and on StackExchange behave differently in the two platforms?

To answer these questions, we combined quantitative (mining a data set combining R mailing list and Stack Exchange activity) and qualitative methods (a survey of R community members). We found that:

  • Survey participants confirm disengagement from the mailing lists, and mention gamification as their motivation to be active on Stack Exchange sites.

  • Different categories of r-help contributors are attracted differentially by Stack Exchange: for example, mailing list users who also help develop R are more likely to be active on Stack Exchange than those who only use R.

  • Contributors to both r-help and Stack Exchange are significantly more active than those who restrict themselves to either the mailing list or to Stack Exchange.

  • Knowledge providers active in both communities answer questions significantly faster on Stack Exchange than on r-help, and their total output increases after the transition to Stack Exchange.

Speed of answering questions for participants active both on r-help and on Stack Exchange

Speed of answering questions for participants active both on r-help and on Stack Exchange

Want to learn more? Check out our full paper How Social Q&A Sites are Changing Knowledge Sharing in Open Source Software Communities, to appear at CSCW 2014.

Bogdan Vasilescu, Eindhoven University of Technology, The Netherlands
Alexander Serebrenik, Eindhoven University of Technology, The Netherlands
Prem Devanbu, University of California, Davis, USA
Vladimir Filkov, University of California, Davis, USA