Collaborative Problem Solving

In our CSCW 2014 paper, Collaborative Problem Solving, we present a series of studies examining the processes through which mathematicians are collaborating to solve problems on a question-answering site, MathOverflow. A goal of this work is to understand how individuals collaborate in solving problems as a first step to later be able to improve technology to better support problem solving. With better problem solving tools we may be able to enhance our ability to solve difficult and complex problems.

How does collaboration take place on MathOverflow? One can imagine a few of different ways. For example, collaboration may take place through lengthy discussions in which ideas are exchanged and a final solution arises from the synthesis and progression of ideas in discussion. This would be a very interdependent form of collaboration. Alternatively collaboration may take place independently, in which a problem is broadcast to large enough crowd of people that at least one person has the appropriate expertise to solve the problem on his or her own.

figure1

We took a bottom-up approach to fully explore the ways in which individuals were collaborating on MathOverflow. Through open coding of 150 collaborations on MathOverflow we identified 5 basic types of collaborative acts. Although we found evidence of individuals providing complete solutions independently, we also found more interactive collaboration too. Semi-structured interviews with active participants suggested different ways through which these collaborating acts were instrumental in developing a final solution. These observations were later confirmed with quantitative analysis of changes in solution quality over time. For example, primary additions, such as providing information, led to improvements in solution quality by providing a good or better solutions, whereas indirect evaluative contributions, such as clarifying the question, led to improvements in solution quality by inspiring better solutions later.

How does collaboration take place on MathOverflow? Our results suggest that in many cases collaboration on MathOverflow falls somewhere between the highly interdependent collaboration and the highly independent collaboration. On MathOverflow solutions are often built iteratively from independent contributions that gradually build on each other and result in a final solution. Our results also suggest a more nuanced view of collaboration, in which there are special types of contributions that are specific to problem solving, such as bringing into question the nature of the problem.

For more, see our full paper, Collaborative Problem Solving: A Study of MathOverflow.
Yla Tausczik, Niki Kittur, and Bob Kraut Carnegie Mellon University

 

Voyant: Generating Structured Feedback on Visual Designs Using a Crowd of Non-Experts

Crowdsourcing offers an emerging opportunity for users to receive rapid feedback on their designs. A critical challenge for generating feedback via crowdsourcing is to identify what type of feedback is desirable to the user, yet can be generated by non-experts. We created Voyant, a system that leverages a non-expert crowd to generate perception-oriented feedback from a selected audience as part of the design workflow.

The system generates five types of feedback: (i) Elements are the individual elements that can be seen in a design. (ii) First Notice refers to the visual order in which elements are first noticed in the design. (iii) Impressions are the perceptions formed in one’s mind upon first viewing the design. (iv) Goals refer to how well the design is perceived to meet its communicative goals. (v) Guidelines refer to how well the design is perceived to meet known guidelines in the domain.

Voyant decomposes feedback generation into a description and interpretation phase, inspired by how a critique is taught in design education. In each phase, the tasks focus a worker’s attention on specific aspects of a design rather than soliciting holistic evaluations to improve outcomes. The system submits these tasks to an online labor market (Amazon Mechanical Turk). Each type of feedback typically requires a few hours to generate and costs a few US dollars.

Our evaluation shows that users were able to leverage the feedback generated by Voyant to develop insight, and discover previously unknown problems with their designs. For example, the Impressions feedback generated by Voyant on a user’s poster (see the video above). The user intended it to be perceived as Shakespeare, but was surprised to learn of an unintended interpretation (see “dog” in word cloud).

To use Voyant, the user imports a design image and configures the crowd demographics. Once generated, the feedback can be utilized to help iterate toward an effective solution.

Try it: http://www.crowdfeedback.me

 

For more, see our full paper, Voyant: Generating Structured Feedback on Visual Designs Using a Crowd of Non-Experts.
Anbang Xu, University of Illinois at Urbana-Champaign
Shih-Wen Huang, University of Illinois at Urbana-Champaign
Brian P. Bailey, University of Illinois at Urbana-Champaign

Social Media Use by Mothers of Young Children

In our upcoming CSCW 2014 paper, we present the first formal study of how mothers of young children use social media, by analyzing surveys and social media feeds provided by several hundred mothers of infants and toddlers in the U.S.

Mothers overwhelmingly did not use Twitter for sharing information about their children, but nearly all of them used Facebook; for example, 96% reported having posted photos of their child on Facebook.

Our findings indicate several common trends in the way mothers use Facebook. Notably, the frequency of posting status updates falls by more than half after the birth of their child, and does not appear to rebound in the first few years of parenthood. However, the rate of photo-posting holds steady at pre-birth levels, meaning that photos comprise a relatively larger portion of posts than prior to the birth.

Contrary to popular belief (as exemplified by apps like unbaby.me that remove a perceived overabundance of baby photos from one’s Newsfeed), mothers do not appear to post exclusively about their offspring (about 15% of posts for first-time moms, and 11% for subsequent births). The first month post-birth contains the most baby-related postings, which then drop off.

After a baby is one month old, the percentage of a mother's posts that mention the child drop off.

After a baby is one month old, the percentage of a mother’s posts that mention the child drop off.

However, posts containing the child’s name receive far more likes and comments than other status updates; this likely gives them prominence in Facebook’s Newsfeed, likely reinforcing the image that mothers’ statuses are overly child-centric.

For more details about how new mothers use social media, including special groups such as those diagnosed with postpartum depression and those whose children have developmental delays, and discussion of how these findings can be used to design social networks and apps that support new moms’ needs, you can download our full paper,  Social Networking Site Use by Mothers of Young Children.
Meredith Ringel Morris, Microsoft Research

Posted in -

Being A Turker

‘Turking’, i.e. crowdsourced work done using Amazon Mechanical Turk (AMT) is attracting a lot of attention. In many ways it is a ‘black box. Amazon is not transparent about how the marketplace functions, what rules govern it, and who the requesters and Turkers – who post and carry out the human intelligence tasks (HITs) – are.

Research has looked to prise open the black box, understand how it operates and use it to get the best results. It is generally considered a great opportunity for getting micro-task work completed at very cheap rates, quickly. There are concerns about AMT as a grey market; some requesters and Turkers are unscrupulous. The question for requesters has been how to design and control the crowd to get genuine work done.

Research on the Turkers themselves has been rather scant, with notable exceptions where people have contacted Turkers, often through AMT itself, done interviews, questionnaires and HITs to express their thoughts and feelings. Who they are and what they think is still unclear. What is myth or truth? We tried to better understand these invisible workers by joining their forum, Turker Nation, and looking in detail at what they discussed amongst themselves.

This is what we found:

  1. Members see Turking as work and are primarily motivated by earning.
  2. Earnings vary but Turking is low wage work: high earners on Turker Nation make ~$15-16k/yr
  3. Workers aspire to earn at least $7-10/hr, but (newbies especially) do lower paid HITs to increase their reputation and HIT count.
  4. Many Turkers choose AMT because they cannot find a good ‘regular’ job or need other income. Some are housebound, others are in circumstances where Turking is one of the few options to earn.
  5. Turker Nation provides information and support on tools, techniques, tricks of the trade, earning, and learning. They mostly share information about good and bad HITs and requesters.
  6. Relationships are key: Turkers like anonymity and flexibility but want decent working relationships with courteous communication. They want fair pay for fair work (decent wages, fairness in judging work, timely payment…) and respect works both ways: good requesters are prized.
  7. Members mostly behave ethically. Ripping requesters off is not endorsed and is justified only against dubious requesters. There is a moral duty to their fellow members.
  8. Members feel that by sharing information and acting cooperatively they can have a stronger effect on regulating the market. Many are skeptical about government intervention.

For more, see our full paper, Being A Turker, which will appear in CSCW 2014

David Martin, Ben Hanrahan, Xerox Research Centre Europe, Jacki O’Neill, Microsoft Research India, Neha Gupta, Nottingham University

Posted in -

CrowdCamp Report: The Microtask Liberation Front: Worker-friendly Task Recommendation

As crowdsourced marketplaces like Amazon’s Mechanical Turk have grown, tool builders have focused the majority of their attention on requesters.  The research community has produced methods for improving result quality, weeding out low-quality work, and optimizing crowd-powered workflows, all geared toward helping requesters.  On the other hand, the community has done a decent job of studying crowd workers, but has not devoted much effort to building usable tools that improve the lives of workers.  At CrowdCamp, we worked on a browser plugin called MTLF that we hope will improve Turkers’ task-finding and work experiences.

A prototype of the MTLF browser plugin

A prototype of the MTLF browser plugin

After installing MTLF, a Turker logs into MTurk.  Our prototype asks them to prioritize their preferences for income, task diversity, or fun.  After completing a task, they are asked to provide a binary rating (hot/not) of a task.  They are then asked whether they want a new task or more of the same task.  Instead of having the Turker wade through the existing difficult-to-grok list of available tasks, MTLF automatically pops up a new task on the Turker’s screen.  As Turkers change their priorities and grade tasks, MTLF’s recommendation algorithm leverages the joint work histories of many workers to identify tasks that match individual worker interests and preferences.  The goal of our tool is to improve worker satisfaction and reduce worker search time and frustration.

We’re not the first to take on the challenge of improving the lives of workers.  Turkopticon is a wonderful tool for Turkers to share information on requesters.  Turkers themselves have identified a number of other tools to help them with their process.  None of these tools, however, optimize crowd workers’ preferences in quite the automated way that requester-oriented tools currently do.  As we build on our prototype, we hope to ingest information from sources like Turkopticon to inform our recommendation algorithms.

While our prototype has a working interface and backend to store user preferences, we’re working hard on more features for a usable first version.  Our next steps include exploring sources of data other than worker preferences, building an initial task recommender, and co-designing and iterating on our initial interface with the help of Turkers.  We’d love your help—our github repository has a list of open needs that you can help out with!

Jonathan Bragg, University of Washington
Lydia Chilton, University of Washington
Daniel Haas, UC Berkeley
Rafael Leano, University of Nebraska-Lincoln
Adam Marcus, Locu/GoDaddy
Jeff Rzeszotarski, Carnegie Mellon University

Posted in -

CrowdCamp Report: Waitsourcing, approaches to low-effort crowdsourcing

Crowdsourcing is often approached as a full-attention activity, but it can also be used for applications so small that people perform them almost effortlessly. What possibilities are afforded by pursuing low-effort crowdsourcing?

Low-effort crowdsourcing is possible through a mix of low-granularity tasks, unobtrusive input methods, and an appropriate setting. Exploring the possibilities of low-effort crowdsourcing, we designed and prototyped an eclectic mix of ideas.

Browser waiting
In our first prototype, we built a browser extension that allows you to complete tasks while waiting for a page to load.

Tab shown loading, while a browser popup shows a  "chose the outlier image" task

A Chrome extension that allows users to perform simple tasks (e.g., odd image selection) while a page is loading

Getting tasks loaded and completed during the time it takes for a page to load is certainly feasible. A benefit of doing so is that the user is already disrupted in their flow by the browser load.

Emotive voting
How passive can a crowdsourcing contribution be? Many sites implement low-effort ways to respond to the quality of a online content, such as a star, ‘like’, or a thumbs up. Our next prototype takes this form of quality judgment one step further: to no-effort feedback.

Using a camera and facial recognition, we observe a user’s face as they browse funny images.

Images being voted on with smiles and frowns

The emotive voting interface ‘likes’ an image if you smile while the image is on the screen, and ‘dislikes’ if you frown.

There are social and technical challenges to a system that uses facial recognition as an input. Some people do not express amusement outwardly, and privacy concerns would likely deter users.

Secret-agent feedback
Perhaps our oddest prototype lets a user complete low-effort tasks coded into other actions.

Our system listens to the affirmative grunts that a person gives when they are listening to somebody –or pretending to. Users are show A versus B tasks, where an “uh-huh” selects one option while a “yeah” selects another.

AwesomeR Interface

The awesomeR meme interface lets a user choose the better meme via an affirmative grunt (i.e. “yeah” or “uh huh”) while he/she is talking to someone else.

Imagine Bob on the phone, listening patiently to a customer service rep while also completing tasks. The idea is silly, but the method of spoken input quickly become natural and thoughtless.

Binary tweeting

Can a person write with a low-bandwidth input? We provide a choice-based composer where users are offered a multiple choice interface for their next word.

Sentence generation with choice-based typing. The program prompts a user to choose one of two words that are likely to come after the previous words, allowing them to generate a whole sentence by low-effort interaction.

Sentence generation with choice-based typing. The program prompts a user to choose one of two words that are likely to come after the previous words, allowing them to generate a whole sentence by low-effort interaction.

By plugging into Twitter for its corpus, the phrases our prototype constructs are realistically colloquial and current. There are endless sentiments that can be expressed on Twitter, but much of what we do say, about one-fifth, is nearly identical to past messages.

As we continue to pursue low-effort crowdsourcing, we are thinking about how experiments such as those outlined here can be used to capture productivity in fleeting moments. Let us know your ideas in the comments.

Find the binary tweeter online, and find our other prototypes at GitHub.

Jeff Bigham, Carnegie Mellon University, USA
Kotaro Hara, University of Maryland, College Park, USA
Peter Organisciak, University of Illinois, Urbana-Champaign, USA
Rajan Vaish, University of California, Santa Cruz, USA
Haoqi Zhang, Northwestern University, USA

CrowdCamp Report: Reconstructing Memories with the Crowd

From cave paintings to diaries to digital videos, people have always created memory aids that allow them to recall information and share it with others. How can the collective memories of many people be combined to improve collective recovery. For example, the layout of a community gathering place with sentimental or historical value could be recovered, or accidents and crimes may be explained using information that appeared trivial at first but actually has great importance.

Our CrowdCamp team set out to determine what some of the challenges and potential methods were for reconstructing places or things from the partial memories of many people.

Case Studies

We began by attempting to reconstruct common memories such as the layout of a Monopoly board. Figure 2 below shows our individual and collective attempts at this task. We found some facts that one group member recalled helped resurface related memories in other members. However, working together also introduced ‘groupthink’, where a false memory from one person corrupted the group’s final model. This is a known problem, and it is one reason why police prefer to interview witnesses separately.

Figure 1. Our reconstruction of a Monopoly board (left), compared to the true version (right).

The Effect of Meaningful Content on Memory

Next, we tried to see how information type changes the process. It’s well documented that people’s minds summarize information for better recollection. We tried 3 cases:

  • No meaning: Memorize a Sudoku puzzle (table of ordered numbers)
  • Some meaning: Memorize a set of about 30 random objects
  • Meaningful scene: Memorize a living room scene

For each, we first tried to memorize parts of the scene without coordination, then with predefined roles, e.g., different members were told to remember disjoint aspects or parts. In both cases we first wrote down what we remembered, then merged our results. Coordinated roles increased both recall and precision. Recall increased because the set of items we remembered individually was more distinct, meaning we did not redundantly memorize the same things. Precision increased because the more narrow task additional focused our attention by removing extra distractors.

Opportunities and Challenges

In some settings, prior domain knowledge allows people to organize for increased collective memory. One theme is that diversity aids in reconstruction. For example, one person may remember colors well while another may be color-blind but have a good spatial memory. Even outsiders who have no connection with the memory may be able to help.  For example, in Figure 2 below, a paid oDesk worker helps us remember our stressful first-day presentation at CrowdCamp by creating an illustration based on notes and images we provided.

An image depicting 4 presenters crying and one girl sitting at a desk in the background.

Figure 2. An oDesk worker’s rendition (left) of our stressful CrowdCamp presentation based on our notes and sketch we provided (above).

We identified three main challenges to reconstructing memories:

  • Groups, especially those containing members with strong personalities, are subject to groupthink, which can introduce errors.
  • Because some aspects of a scene are more salient, people’s memories often overlap significantly.
  • In unbounded settings, people’s accuracy decreases, likely due to an overwhelming amount of information

One consistent property was that we tended to remember nearly all of the information we could recall in total in the first few seconds or minutes, depending on the size of the task. After that, significant gains were only seen when one person’s idea jogged the memory of another.

Future Directions

We believe this work has great potential to introduce a more structured way to recreate memories using groups of people of all sizes, while avoiding problems encountered with naïve solutions. For example, approaches that mix individual recollection early on with later collaboration, while using parallel subsets of workers to minimize groupthink, could improve the way we recover knowledge in settings ranging from historical documentation to crime scenes.

What other ideas or references for recovering ideas can you think of? Anything we missed? We’d love to hear about it!

Authors
Adam Kalai, Microsoft Research

Walter S. Lasecki, University of Rochester / Carnegie Mellon University
Greg Little, digital monk
Kyle I. Murray, MIT CSAIL

CrowdCamp Report: HelloCrowd, The “Hello World!” of human computation

The first program a new computer programmer writes in any new programming language is the “Hello world!” program – a single line of code that prints “Hello world!” to the screen.

We ask, by analogy, what should be the first “program” a new user of crowdsourcing or human computation writes?  “HelloCrowd!” is our answer.

Hello World task

The simplest possible “human computation program”

Crowdsourcing and human computation are becoming ever more popular tools for answering questions, collecting data, and providing human judgment.  At the same time, there is a disconnect between interest and ability, where potential new users of these powerful tools don’t know how to get started.  Not everyone wants to take a graduate course in crowdsourcing just to get their first results. To fix this, we set out to build an interactive tutorial that could teach the fundamentals of crowdsourcing.

After creating an account, HelloCrowd tutorial users will get their feet wet by posting three simple tasks to the crowd platform of their choice. In addition to the “Hello, World” task above, we chose two common crowdsourcing tasks: image labeling and information retrieval from the web.  In the first task, workers provide a label for an image of a fruit, and in the second, workers must find the phone number for a restaurant. These tasks can be reused and posted to any crowd platform you like; we provide simple instructions for some common platforms.  The interactive tutorial will auto-generate the task urls for each tutorial user and for each platform.

Mmm, crowdsourcing is delicious

Mmm, crowdsourcing is delicious

More than just another tutorial on “how to post tasks to MTurk”, our goal with Hello Crowd is to teach fundamental concepts.  After posting tasks, new crowdsourcers will learn how to interpret their results (and get even better results next time).  For example: what concepts might the new crowdsourcer learn from the results for the “hello world” task or for the business phone number task?  Phone numbers are simple, right?  What about “867-5309” vs “555.867.5309” vs “+1 (555) 867 5309”?  Our goal is to get new users of these tools up to speed about  how to get good results: form validation (or not), redundancy, task instructions, etc.

In addition to teaching new crowdsourcers how to crowdsource, our tutorial system will be collecting a longitudinal, cross-platform dataset of crowd responses.  Each person who completes the tutorial will have “their” set of worker responses to the standard tasks, and these are all added together into a public dataset that will be available for future research on timing, speed, accuracy and cost.

We’re very proud of HelloCrowd, and hope you’ll consider giving our tutorial a try.

Christian M. Adriano, Donald Bren School, University of California, Irvine
Juho Kim, MIT CSAIL
Anand Kulkarni, MobileWorks
Andy Schriner, University of Cincinnati
Paul Zachary, Department of Political Science, University of California, San Diego

CrowdCamp Report: Just-In-Time Emoji

Emoji provide an expressive platform for non-verbal online communication, and hundreds of emojis are now available for use on smartphones and chat clients.  But the set is by no means complete; many emojis are missing and desired.

At CrowdCamp at HCOMP 2013, we asked, can we create these missing emoji just-in-time by drawing on the crowd? We explored three methods: creating rebuses of existing emoji, finding images on the web, and drawing a new emoji on demand.

Rebus Approach

The first approach asks Mechanical Turk workers to put together a rebus, a sequence of existing emoji and possibly other letters and punctuation.  We gave workers an editor displaying Github’s extended emoji set:

RebusEditor

MTurk workers produced some very interesting and creative results, including:

 

Prompt

Rebus Created by Mechanical Turk Workers Work Time

taco

    TackO 20s

sandwich

 BreadChickenBread 37s

unicorn

 UniCorn 19s

The Dark Knight

 PenDN2DN3DN4DN5 162s

This approach demonstrates that crowd workers are capable of doing a defined just-in-time creative task that requires the mixing and matching different components.

Image Search Approach

The second approach presents an image search of the desired word or phrase to a crowd worker, with all the images rescaled to emoji size, and asks the worker to pick three that best represent the word or phrase:

GoogleImageSearch

We prototyped this task using other HCOMP attendees, and collected the following results:

Prompt

Images filtered by workers Mean work time

unicorn

 UnicornResults 46s (includes reading instructions)

taco

 TacoResults 26s

blonde girl

 BlondeGirlResults 31s

In general, the amount of time to select images was very short, so little creativity was required. The results were strong for some of the simpler keywords (unicorn / taco) but more scattered for an image without a readily discernable emoji (blonde girl).

Drawing-on-Demand Approach

A third approach was to ask the worker to draw an original image using a web-based pixel editor.  Here are some of the results,

Prompt

Image Drawn by Worker Elapsed Time

Unicorn

 UnicornDrawn 30s

Taco

TacoDrawn 30s

Blonde girl

 BlondeDrawn 47s

We Are the World

 WeRWorldDrawn  8m 15s

Discussion

In general, for simple requests, all three approaches produced creative and comprehensible emoji in a short time. Each approach has its own advantages and disadvantages. For example, the rebus approach produces more interesting results but might not produce as good results for a keyword like “Spock”–a character from Star Trek–where the image search approach might be more appropriate.

In general, as the requests got more abstract, such as movie and song names like “Shawshank Redemption” and “We Are the World”, the time to create an answer increased, and comprehensibility dropped.

 

Michael Weingert, University of Waterloo

Rob Miller, MIT

Jenn Thom, Amazon

Pao Siangliulue, Harvard University

Shih-Wen Huang, University of Washington

Posted in -

Personal Informatics to Encourage Diversity in News Reading

Today, people can choose from among more news sources than ever, some of which cater to particular ideological niches or which highlight items popular in one’s social network. Scholars and pundits express concerns this technology could reinforce people’s tendency to read predominantly agreeable news, through their own choices and the choices made by system designers. While this might increase short-term engagement and help people feel validated, it may not further either individual or societal goals. Reading broadly can further learning and out-of-the-box thinking. Individuals who are aware of other viewpoints can better communicate and empathize with those who disagree.

Despite theories that predict a preference for reading agreeable political news, many people appear to agree with the norm of reading diverse viewpoints, and at least some people actively prefer it. Colleagues and I were curious whether a personal informatics tool could help people identify when their behavior is inconsistent with this norm and help them take corrective action.

Balancer extension example

The Balancer extension gives readers feedback on the political lean of their online newsreading.

To test this, we built Balancer, an extension for the Chrome web browser. Balancer enables users to see patterns in their behavior and also reminds them of the norm of balance, the form a character on a tightrope. When the user’s newsreading is balanced, the character is happy; when the user’s newsreading is not, the character is in peril of falling.

In a one-month, open-enrollment controlled field experiment, this extension encouraged participants with unbalanced reading habits to make small but measurable changes in the balance of their newsreading. Compared to a control group, users receiving feedback from Balancer made 1-2 more weekly visits to a website with predominantly opposing views or 5-10 more weekly visits to a site with more neutral views.

We are working to improve its capabilities, and others are also making progress in this space. There are now many browser extensions and other tools that give people feedback about the news they read and sources they follow. ManyAngles recommends articles that cover different aspects of the topic about which a user is currently reading. Slimformation reveals topical diversity in one’s online news-reading. Scoopinion givers users feedback on their top authors, sources, and genres. Follow Bias shows people the gender (im)balance of their Twitter network. This is an exciting time for tools that help readers reflect on the news they read!

For more, see our full paper, Encouraging Reading of Diverse Political Viewpoints with a Browser Widget.

Sean A Munson, Human Centered Design & EngineeringDUB Group, University of Washington
Stephanie Y. Lee, Sociology, University of Washington
Paul Resnick, School of Information, University of Michigan