What Might Yelp’s “Fake Review Filter” be Doing?

fake_review_yelp

Yelp’s message for users submitting fake reviews:
Image source: http://officialblog.yelp.com/2013/05/how-yelp-protects-consumers-from-fake-reviews.html 

Fake Reviews and Yelp’s take on them

Fake reviews of products and businesses have become a major problem in recent years. As the largest review site, Yelp has been tasked with filtering fake/suspicious reviews on a commercial scale. However, Yelp’s algorithm for doing so is a trade secret.

Our work aims to understand what Yelp’s filter might be looking for, by exploring the linguistic and behavioral features of reviews and reviewers.

1.    Detection based on linguistic features:

Prior research in [Ott et al., 2011; Feng et al., 2012] showed that classification using linguistic features (i.e., n-grams) can detect crowd-sourced fake reviews (using Amazon Mechanical Turk) with 90% accuracy.

Applying the same approach on Yelp’s real-life fake review dataset (using filtered as fake and unfiltered as non-fake reviews) however yields only 68% detection accuracy. We analyzed fake and real reviews to understand the reason for this difference in accuracy finding that:

  • Turkers’ probably did not do a good job at Faking!
  • Yelp Spammers are smart but overdid Faking!

2.  Detection based on behavioral features

Prior work in [Jindal and Liu, 2008; Mukherjee et al., 2012] showed that abnormal behavior features of reviewers and their reviews are effective at detecting fake reviews: Abnormal behavioral features yielded 83% accuracy on the Yelp fake review dataset.

Below we show the discriminative strength of several abnormal behaviors (MNR: Maximum number of reviews per day, PR: Ratio of positive reviews, RL: Review length, RD: Rating deviation, MCS: Maximum content similarity).

behaviors_cdf

Summary of Main Results

Yelp, arguably, does at least a reasonable job at filtering out fake reviews, based on four pieces of evidence:

  1. Classification under balanced class distribution gives an accuracy of 67.8%, which is significantly higher than random guessing of 50% showing linguistic difference between filtered and unfiltered reviews
  2. Using abnormal behavioral features render even higher accuracy. It is not likely for a genuine reviewer to exhibit these behaviors.
  3. Yelp has been doing industrial scale filtering since 2005. It is unlikely that their algorithm is not effective.
  4. We are aware of cases where people who wrote fake reviews were caught by Yelp’s filter. Although these evidences are not conclusive, they are strong enough to render confidence that Yelp is at least doing a reasonable job at filtering.

How does Yelp Filter? From our results, we can speculate that Yelp might be using a behaviorally-based approach for filtering.

Amazon Mechanical Turk (AMT) crowd-sourced fake reviews may not be representative of commercial fake reviews as Turkers may not have genuine interests in writing fake reviews like commercial fake reviewers.

For more, see our full paper, What Yelp Fake Review Filter Might Be Doing?

Arjun Mukherjee, University of Illinois at Chicago
Vivek Venkataraman, University of Illinois at Chicago
Bing Liu, University of Illinois at Chicago
Natalie Glance, Google

Butler Lies From Both Sides

We’re almost constantly connected today, which makes it very easy to coordinate social activity. But our constant connection also forces us to lie sometimes:

Screen shot of a text message conversation including a butler lie

An example text message conversation with a butler lie.

This is an example of what we call a butler lie, a linguistic strategy used to manage your availability. We’ve documented butler lies in our prior work, but there are still open questions:

  • Do people consider these to be actual lies, or are they part of the normal course of business in modern social interaction?
  • How good are people at detecting lies in text messages?

To find out, we collected a total of 2,341 text messages exchanged by 82 pairs of friends, and asked both the sender and receiver to judge how deceptive each message was. We found that:

  • Senders lied more often in butler messages (about starting, stopping, or arranging social interactions). 21.7% of butler messages were intended to deceive. Only 6.2% of non-butler messages were.
  • Receivers missed many of the lies. Only 10.4% of butler messages were perceived as deceptive, while 8.0% of non-butler messages were perceived that way. Evidently people expect others to lie to them, particularly about availability, but not as frequently as they actually are being deceived.
  • When people tell lies, they feel worse about it than the receiver of the lie. Deception is commonplace.  We need a more nuanced approach to deception in availability management.
Graph showing actual and perceived lying rates for butler and other messages

Comparison of actual and perceived rates of lying.

What are the implications?

  • Discovering a deception may not always threaten a relationship. People expect to be deceived sometimes within the context of a relationship.
  • Ambiguity about time and location lets us lie in socially useful ways. These results question the recent moves to greater transparency about when messages were seen (in iMessage and Facebook) and user location (in foursquare, Twist, Glympse).

Want to learn more? Check out our full paper (Butler Lies From Both Sides: Actions and Perceptions of Unavailability Management in Texting) and lab websites (Northwestern & Cornell).

Lindsay Reynolds, Cornell University
Madeline E. Smith, Northwestern University
Jeremy Birnholtz, Northwestern University
Jeff Hancock, Cornell University