Enhancing Technical Q&A Forums With CiteHistory

CiteHistory

CiteHistory is a browser plugin which helps technical forum participants track and share the research they do when asking and answering questions online.

When I program, I frequent Q&A sites like Stack Overflow or the MSDN Forums. I am not alone – past work has demonstrated that programmers make extensive use of these, and similar sites, when coding. If, like me, you frequent Stack Overflow or MSDN Forums, you might agree that the answers tend to be very technical and detailed.

Where on earth do these high-quality answers come from? After all, it seems unlikely that the authors have all these details memorized.

We set out to investigate this question, and what we found motivated us to build CiteHistory, a tool to help forum users share their online research – but I’m getting ahead of myself. First, let’s discuss how answerers research their solutions:

Similar to programmers writing code, we found that Stack Overflow and MSDN Forum participants make extensive use of online resources when answering programming questions.

We know this after surveying hundreds of MSDN Forum participants, and after performing log analysis of the sessions of 120 Stack Exchange users. In more detail:

  • About 50% of all answers involve online research
  • Research sessions last an average of 20 minutes, and consist of visiting an average of 4 relevant URLs

Critically, however, we found:

  • Fewer than 25% of all posts contain links or any other direct evidence of the extensive research conducted

This lack of information provenance is unfortunate because:

  • Such information can help readers assess an answer’s credibility
  • Survey respondents reported that links are often sufficient for answering questions (and posts with such links tend to be highly rated)

To ameliorate this situation, we developed a browser plugin called CiteHistory. CiteHistory automatically keeps track of the research that authors conduct while asking or answers questions on technical forums, and helps authors share this research material with the forum community. CiteHistory also associates research metadata with each post  (e.g., time spent researching, number of pages visited, etc.) to help highlight the true research effort involved in asking or answering forum questions.

For a demonstration of CiteHistory in action, please watch our introductory video:

CiteHistory Video

 

 

 

 

 

 

 

 

We evaluated CiteHistory in a two-week deployment study within a large organization. We found that CiteHistory succeeded in encouraging users to cite reference material, and was praised for its dual role as a personal research logbook.

You can also try out CiteHistory by visiting:  http://research.microsoft.com/en-us/um/redmond/projects/citehistory/

For more, see our full paper, Enhancing Technical Q&A Forums With CiteHistory.

Adam Fourney, University of Waterloo
Meredith Ringel Morris, Microsoft Research

Crowd-Powered Replies to Public Twitter Questions

“Friendsourcing” information seeking by posting questions to a social networking site, like Twitter, is becoming increasingly common. However, unlike conventional search engines, friendsourced information seeking does not always return an answer. Examining over 85,000 information-seeking tweets from the public Twitter firehose, we found that only about a third received replies at all.

We created a Twitter agent that monitors the public feed for information-seeking questions, generates an answer via crowdsourced labor, and tweets the reply back to the original asker. This image shows an example:

Our crowd-powered Twitter agent serendipitiously offers advice about how to clean Sperrys (a type of shoe). The asker, pleased with the response, favorited the reply and retweeted it.

Our crowd-powered Twitter agent serendipitiously offers advice about how to clean Sperrys (a type of shoe). The asker, pleased with the response, favorited the reply and retweeted it.

Raters evaluating naturally occurring “friendsourced” answers to questions on Twitter and the answers generated via our crowdsourcing project rated them as being of equally high quality. Indeed, the people who received replies from our agent were generally quite pleased with the experience — over a third of them thanked our agent in some way, such as by favoriting the tweet or sending a reply message on Twitter.

Our work illustrates ways in which a “socially embedded search engine” can augment basic social network Q&A experiences.

For more, see our full paper, A Crowd-Powered Socially Embedded Search Engine. You can also view some sample Q&A exchanges our agent participated in at our project’s Twitter page.

Jin-Woo Jeong (Hanyang University), Meredith Ringel Morris, Jaime Teevan, & Dan Liebling (Microsoft Research)

Do ideas spread like viruses in the blogosphere?

The PageRank algorithm assesses the importance of a web page through the number of links pointing to it from important pages; pages with a large number of in-links are likely to be high-quality. This self-consistent definition has proven to be practically relevant, serving as the core of Google’s search algorithm. So, when considering the scale of the blogosphere, looking at links (citations) from one blog to another as a means of evaluating the influence of a blogger, and thus identify opinion leaders.

But, how should we look for leadership in the structure of a sequence of citations? To answer this question, we need to know the actual meaning of a citation in the blogosphere, and the kind of influence it reveals. For example, do bloggers spread information through citation links the same way a virus contaminates individuals in a population?

In this work, we investigated these questions using qualitative and quantitative analyses of the citation patterns among around 3,200 of the top blogs of the French-speaking blogosphere.

Although appealing, the epidemic model proves itself misleading in this context: an inspection of the patterns shows that citations cannot be reduced to a mere duplication of information. In fact, even shared subjects are hard to define: topics mutate abruptly along the citation sequence, as illustrated on the example below.

cascade_mut_bis

Blue nodes deal with the outcome of the “No-Sarkozy day”, while the red ones concern another matter (a sexist advertisement). The mutation happens when post B cites post A.

Quantitatively, we showed that a random citation model which only takes into account “who cites whom” and “at which pace”, without considering the posts’ content, is sufficient to explain important structural features of the citation patterns. In other words, the information conveyed by the posts play quite a secondary role in this process, so that citations in this medium might serve other functions, such as rewarding a blogger.

We could identify specific patterns which display high topic-unity and possibly information spreading, but the detection of such a subtle phenomenon calls for additional tools. Indeed,  the informal nature of the blog medium give much freedom to create complex contents. Note that other online media, such as Twitter, are more rigidly formatted, and could then be more suited to the measurement of information spreading.

For more, see our full paper,  A data-driven analysis to question epidemic models for citation cascades on the blogosphere.

Abdelhamid Salah Brahim LIAFA, Paris VII University
Lionel Tabourier naXys, Namur University
Bénédicte Le Grand CRI, Paris I University

 

The Crowd at HICSS 2013 Series – #5

The Rise and Fall of Crowdsourcing?

 in Proceedings of the Hawai’i International Conference on System Science 2013

Henri Simula

Aalto University, School of Science

Crowdsourcing has been discussed both in academic and managerial articles in recent years. Despite some critical voices, the overall attitude towards crowdsourcing has been quite positive in extant literature. In this paper we want to address potential drawbacks and issue that create shadows on top of crowdsourcing. The overall purpose of this paper is to discuss the reasons why crowdsourcing initiatives may not always live up to the expectations placed upon them. Despite some seemingly successful case examples, not every crowdsourcing initiative has taken off. While some of the barriers are case or industry specific, there are also certain overall reasons hindering crowdsourcing from reaching de facto modus operandi, especially in the innovation creation context. This paper is intentionally written through a critical lens by design and hopefully provides a constructive balance for those with an overly positive approach towards crowdsourcing.

 

slide-1-638

http://www.slideshare.net/henrisimula/hicss-2013-presentationsimula

The Crowd at HICSS 2013 Series – #4

 

CROWDSOURCING CRITICAL THINKING

One of the challenges in using social media technologies, such as Twitter, for disaster response is that information that can help save lives is buried under the sea of other information and misinformation. This was the case in the aftermath of the 2011 Great East Japan Earthquake. For example, information on Twitter helped the rescuing of children and teachers who were stranded at a school building. However, finding this information was extremely hard because a lot of unverified tweets spread during disaster response, even after people pointed out that the unverified tweets were false rumors in their criticism tweets.

Motivated by these observations, Tanaka, Sakamoto, and Matsuka examined if the critical thinking of crowds could help reduce the spread of misinformation. Using false tweets and criticism tweets related to the Great East Japan Earthquake, they conducted an experiment, in which half of the students in Japanese universities saw criticism tweets before seeing the false tweets, and the other half did not. They found that exposing subjects to criticism tweets increased the decision not to share the false tweets about 1.5 times, from 32% to 49%. When subjects decided to share the false tweets even after seeing the criticism tweets, they perceived the false tweets as more accurate, more important, and more anxiety-provoking than when they decided not to share the false tweets after seeing the criticism tweets. Their work, which won the best paper award in the Collaboration Systems and Technologies track, demonstrated that exposing people to criticism tweets could change their perceptions of and significantly reduce the decision to spread the associated false tweets.
Given these findings, the group is examining how to promote the credibility evaluation by crowds to reduce the spread of misinformation and extract useful information on social media during disasters, and if it is possible to change how crowds perceive and feel about disaster-related information on social media to direct their sharing decision. Changing the perspective of crowds was the focus of another HICSS 2013 paper, which received a best paper nomination. By following this link you can find more about their research on improving social media for disaster response.

The Crowd at HICSS 2013 Series – #3

Information Exchange in Prediction Markets: How Social Networks Promote Forecast Efficiency

in Proceedings of the Hawai’i International Conference on System Science 2013

Liangfei Qiu - Department of Economics - University of Texas at Austin

Huaxia Rui - Simon School of Business - University of Rochester

Andrew B. Whinston - Department of Information, Risk and Operations Management - University of Texas at Austin

This paper studies the effects of information transmission on wisdom of the crowd. We provide a game-theoretic framework to resolve the question: Do social networks promote the forecast efficiency in prediction markets?

Our study shows that a social network is not a panacea in terms of improving forecast accuracy. The use of social networks could be detrimental to the forecast performance when the cost of information acquisition is high. We also study the effects of social networks on information acquisition in prediction markets. In the symmetric Bayes-Nash equilibrium, all participants use a threshold strategy, and the equilibrium information acquisition is decreasing in the number of participant’s friends and increasing in the network density. The aforementioned results are robust to two commonly used mechanisms of prediction markets: a forecast-report mechanism and a security-trading mechanism.

In the paper, we compare the performance of non-networked prediction markets (NNPM) with the performance of social-network-embedded prediction markets (SEPM). In the simulation, we use two measures of prediction market performance: the forecast accuracy and the mean squared errors (MSE) of the prediction market.

Figure #1 A & B – A Comparison between the Performances of the SEPM and the NNPM

Figure #1(a) – Forecast Accuracy

a1

Figure 1(a) shows that when the cost of information acquisition is low, the SEPM outperforms the NNPM in terms of forecast accuracy, and when the cost is high, the NNPM outperforms the SEPM.

Figure #1(b) – MSE

 b

In Figure 1(b), this result is robust to a different measure of prediction market performance: MSE. represents the MSE computed in the NNPM, and  represents the MSE in the SEPM. When  is small, , which means that the SEPM outperforms the NNPM. As  increases,  decreases, and when  is large enough, the NNPM performs better than the SEPM.

There are two implications of this result. First, when the cost of information acquisition is low, a social network can enhance forecast accuracy in prediction markets. Second, a social network also has a negative effect on the forecast accuracy of a prediction market when the cost of information acquisition is high.

The paper at SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2047904

The Crowd at HICSS 2013 Series – #2

The Theory of Crowd Capital

in Proceedings of the Hawai’i International Conference on System Science 2013

 John Prpić & Prashant Shukla

Beedie School of Business

Simon Fraser University

We are seeing more and more organizations undertaking activities to engage dispersed populations through IT. Using the knowledge-based view of the organization, this work conceptualizes the theory of Crowd Capital to explain this phenomenon. A diagram of our model is shown immediately below.

Diagram of Crowd Capital Creation Theory of Crowd Capital - Model

Crowd Capital is a heterogeneous knowledge resource generated by an organization, through its employ of Crowd Capability. An organization’s Crowd Capability engages the Dispersed Knowledge (Hayek 1945) of individuals –the Crowd.

Crowd Capability includes three dimensions by which an organization engages Dispersed Knowledge: a structure (some form of IT), content (the knowledge that the organization desires), and a process (internal work which sorts, filters, synthesizes, the incoming information).

Crowd Capital is always IT-mediated. In other words, forms of IT (web pages, mobile apps, sensors, software etc.) are always employed by organizations to engage the antecedent condition of Dispersed Knowledge.

Organizations exist in an environment of Dispersed Knowledge, hence, Dispersed Knowledge is not only external to the organization, but also  can be engaged internally, externally or both simultaneously.

Crowd Capital can be generated through episodic or continuous forms of IT.  Here we distinguish between forms of IT that necessitate community and collaboration to function, and those that do not. For example, we reason that Google’s ReCaptcha and Citizen Science applications like Foldit, do not require community and collaboration to generate Crowd Capital, whereas Innovation Communities (von Hippel 2005) and Peer Production (Benkler & Nissenbaum 2006) do.

If you’re interested, you can find a preprint copy of Prpić & Shukla (2013) here:

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2193115

We very much look forward to your comments!

Don’t Hide in the Crowd! Increasing Social Transparency Between Peer Workers Improves Crowdsourcing Outcomes

Although crowdsourcing is a useful social computing technique, its unreliability has greatly undermined its utility. In this study, we found that carefully manipulating the social transparency and various peer-dependent reward schemes can successfully motivate crowds to generate high-quality work.

Previous research has shown that social transparency can make people more accountable for their own actions in online collaborative work. Nevertheless, it is not easy to utilize social transparency in crowdsourcing since crowd workers usually work individually. Our previous study on peer consistency evaluation demonstrated that simply making the rewards of crowd workers depend on each other can create social effects between workers, motivating them to perform significantly better. However, not all peer-dependent reward schemes create positive social effects for collaborative work. The possible social effects of peer-dependent reward schemes from previous literature are summarized below:

  • Altruistic Motives: crowds may work harder to benefit their colleagues (Bandiera et al. Quarterly Journal of Economics ‘05)
  • Social Loafing: crowds may feel that they can hide in the crowds because personal effort is hard to evaluate (Karau et al. Journal of Personality and Social Psychology ‘93)
  • Social Facilitation: crowds may perform better because they think their work can be used as a point of reference for others in the group (Harkins, Journal of Experimental Social Psychology ‘87)

We conducted a 3X2 experiment by setting two levels of social transparency (low: anonymous, high: demographic information revealed) and three different peer-dependent reward schemes (individual, teamwork, competition). The main findings of our experiment are as follows:

1. Social transparency successfully motivated crowds to generate high-quality outcomes when their rewards were codependent.

PicForBlog

When the workers worked individually, there was no significant difference between the workers who were anonymous and those who shared their demographic information. However, when making the rewards of the workers codependent, the difference between the performances of the two groups became significant. This shows that connecting the crowds is the key for us to utilize social transparency to enhance the reliability of crowdsourcing outcomes.

2. Social loafing harms the performance of crowds when only the collective outcomes are evaluated.

In team environments, the rewards of the workers were decided by the average of their performance and their teammates. We found that when a crowd worker was paired with a teammate that had good performance, they performed significantly worse. This result shows that social loafing really creates a negative effect on crowd work when the personal effort is difficult to evaluate.

PicForBlog2

3.  Social facilitation was effective only when there was social transparency between the crowds.

In competitions, the rewards of crowds were decided by the positive difference between their performances and their opponent’s. We found that, when workers shared their demographic information, the workers were motivated to outperform their opponents. However, this effect did not exist when the participants worked anonymously, which indicates that social transparency makes social facilitation more effective.

PicForBlog3

For more, see our full paper, Don’t Hide in the Crowd! Increasing Social Transparency Between Peer Workers Improves Crowdsourcing Outcomes

Shih-Wen Huang, University of Illinois at Urbana-Champaign

Wai-Tat Fu, University of Illinois at Urbana-Champaign

The Crowd at HICSS 2013 Series – #1

Motivation and data quality in a citizen science game: A design science evaluation

in Proceedings of the Hawai’i International Conference on System Science 2013

Kevin Crowston & Nathan R. Prestopnik
School of Information Studies
Syracuse University

Citizen science is a form of social computation where members of the public are recruited to contribute to scientific investigations. Finding ways to attract participants (i.e., motivation) and to ensure the accuracy of the data they produce (i.e., data quality) are key issues in making such systems successful. In this paper we describe the design and preliminary evaluation of a simple game that addresses these two concerns for the task of species identification.

In the game, called Happy Match, players are presented with a set of photographs of some organism (e.g., moths, sharks, rays). The players categorize each specimen on a set of characters, e.g., Shape at Rest, Forewing Main Colour, Forewing Distinctive Colour and Forewing Pattern for moths. For each character, there is a set of possible states, e.g., Arrow, Tent, Parallel, etc. for Shape at Rest. Each round of the game is seeded with one or two already-classified photographs, from which a score for the round can be calculated.

To evaluate the game on data quality primarily and motivation secondarily, we paid 200 workers from Amazon Mechanical Turk US$0.50 each to play. To motivate performance, we offered a bonus of US$0.50 for achieving a good score on the game. After playing, the workers filled out a survey about their impressions of the game. For this evaluation, we used photographs of moths for which we had known classifications to be able to compute data quality.

The main finding is that data quality was at an acceptable level for 3 out of 4 characters (all except Forewing Pattern). The pattern of errors gave us some ideas to improve the remaining character. Since we paid the AMT workers to play, it is difficult to determine the intrinsic motivation of the game. However, we did find that about 1/3 of workers played more games than required to be paid or to earn the bonus, suggesting that the game was motivating for at least some people.

Web Tutorials as a Gathering Place for Community Contribution

Web-based tutorials play an important role in how people learn and use complex software. Unfortunately, tutorials aren’t always as helpful as they could be. The quality of the instruction may be poor (or just not matched to the user’s level of knowledge), and a tutorial may not exist for the exact task the user is trying to do, forcing them to try and adapt tutorials for similar tasks.

Our paper investigates community enhanced tutorials, a new kind of web tutorial system with the potential to enable tutorials that improve as they are used by a community of users. This is achieved by embedding a fully functioning application into the tutorial, turning it into a hub for both learning and actually performing the tutorial task. As the tutorial is used to complete tasks, it can record users’ efforts and generate alternate demonstrations of each tutorial step.

System diagram for Community Enhanced Tutorials

Community Enhanced Tutorials improve over time as they are used by a community of users.

From a crowdsourcing perspective, community enhanced tutorials have two main advantages. First, they create a concrete gathering place for users interested in a particular task in the application. We looked at how this could be used to collect demonstrations from users, but this gathering of users could be mobilized in other ways as well. Small contributions could be solicited from these users to improve the tutorial content, or to assist other users directly.

Second, because community enhanced tutorials exist on the web, they are compatible with all the incentives that drive users to create traditional tutorials, such as earning recognition, ad revenue, or membership fees.

Our paper presents FollowUs, a prototype community enhanced tutorial system that we created to test these ideas. We included a range of features for browsing and working with multiple video demonstrations in a tutorial, which you can see in our video.

We also conducted a study to answer a key question that underlies this approach: Can providing additional demonstrations make a tutorial more robust? We found that users do make use of additional demonstrations when they are available, and our results suggest that multiple demonstrations can improve a tutorial’s quality and make it more widely applicable.

For more, see our full paper, Community Enhanced Tutorials: Improving Tutorials with Multiple Demonstrations.
Ben Lafreniere, University of Waterloo
Tovi Grossman, Autodesk Research
George Fitzmaurice, Autodesk Research