Properties, Prediction, and Prevalence of Useful User-generated Comments for Descriptive Annotation of Social Media Objects

User-generated comments in online social media have recently been gaining increasing attention as a viable source of general-purpose descriptive annotations for social media objects like online shared photos or videos. However, the quality of user-generated comments varies from very useful to entirely useless; comments can even be abusive or off-topic.

The most common methods for estimating the usefulness of user-generated comments simply allows all users to vote on (and possibly moderate) the contributions of others, thus avoiding an explicit definition of “useful”.

We investigate usefulness from the user’s perspective, defining a comment as USEFUL  if it provides descriptive information about the media object beyond the usually very short title accompanying it.

With this definition in hand, we asked:

  • What are PROPERTIES of useful comments?  
  • How to PREDICT useful comment?
  • How to estimate the PREVALENCE of useful comment?

Using Text-based, Semantic, Topical, and Author Features, we characterized crowd-sourced labeled comments on two classes of media objects (comments on Flickr photos and YouTube videos) and trained prediction models. Furthermore, an existing Bayesian Prevalence model is adapted that uses the learned prediction models to estimate the prevalence of useful comments among different platforms and topics.

We found that:

  • Properties of USEFUL comment varies slightly according the platform’s commenting culture and different topics of media objects. Comments that contain a higher number of references, a higher number of named entities, fewer self-references and less affective language are more likely to be inferred as USEFUL. Moreover, users express more emotion and may use more offensive language when writing comments about topics related to person and event.
  • Prediction performance is better when the classifier is trained on comments of a single topic, (type-specific), whereas performance is worse when the topic is ignored (type-neutral). Thus, for a more accurate prediction, a model should be trained that takes into account the topic of media objects.
  • Prevalence of USEFUL comments influenced by:
    • The time of the topic of media object being commented. The nearer the time period of a topic is to the present time, the lower the usefulness prevalence is.

    Rate-Time

    • The degree of polarization of topics among commenters.

    Rate-Polarization

    • The topic of the media object being commented and the platform’s commenting culture

    Rate

     

Want to learn more? see our full paperProperties, Prediction, and Prevalence of Useful User-generated Comments for Descriptive Annotation of Social Media Objects


Elaheh Momeni, University of Vienna
Claire Cardie, Cornell University
Myle Ott, Cornell University

One thought on “Properties, Prediction, and Prevalence of Useful User-generated Comments for Descriptive Annotation of Social Media Objects

  1. Interesting research!
    I was wondering whether these analyses could be applied for tweets that contain Flickr images or Youtube videos. And is there any differences between those tweets and comments?

Comments are closed.