On the Interplay between Social and Topical Structure

Your friends and your topics of interests are intuitively related – people form friendships through mutual interests and at the same time people discover new interests through friends. We are interested in exploring the ways in which social and topical structures can predict each other. We ask two basic questions:

  1. How well can a person’s topical interests predict who her friends are?
  2. How well can the social connections among the people interested in a topic predict the future popularity of that topic?

In order to answer these questions we study 5 million Twitter users. We study their hashtag usage to identify topical interests and their follower/@-messages to identify two different kinds of social relationships.

To predict whether two users have a social relationship based on their hashtags, we use logistic regression models trained on a wide range of distance measures, measuring topical similarity. Interestingly, one of the most predictive measures is also one of the simplest ones to compute: the size of the smallest hashtag shared by the two users.

Our full model has an accuracy of 77% when predicting follower relationships and 86% when predicting @-message relationships. We also find that predicting strong ties is much easier that predicting weak ties. Our model achieves an accuracy of up to 98% when predicting the strongest pairs, which exchanged more than 20 @-messages.

Linkage probability as a function of smallest common hashtag. (a) The probability of a given user following another user as a function of the size, and (b) the probability of a given user @-messaging another user as a function of size. Both figures are shown on a log-log scale.

Linkage probability as a function of smallest common hashtag. (a) The probability of a given user following another user as a function of the size, and (b) the probability of a given user @-messaging another user as a function of size. Both figures are shown on a log-log scale.

We also predict the future popularity of a hashtag from the social relationships of its early adopters. In particular, we predict whether a hashtag will double in size, studying only the social connections among the early adopters of the hashtag. Our intuition is that when the early adopters of a hashtag are very well connected, the hashtag is exhibiting high virality as it spreads quickly through the network, destined to become popular, notably, #tcot (top conservatives on Twitter), #tlot (top libertarians on Twitter). On the other hand, if the early adopters are all nearly disconnected, the hashtag is likely to be related to a popular topic exogenous to Twitter and likely to become popular on Twitter as well, #michaeljackson as an example. We find evidence that indeed hashtags with well-connected or well-disconnected early adopters are more likely to become popular than those in between.

Probability that hashtags will exceed K adopters given the number of edges in the graph induced by the 1000 initial adopters, using a sliding window. From top to bottom, K = 1500, 1750, 2000, 2500, 3000, 3500, 4000.

Probability that hashtags will exceed K adopters given the number of edges in the graph induced by the 1000 initial adopters, using a sliding window. From top to bottom, K = 1500, 1750, 2000, 2500, 3000, 3500, 4000.

The full model, which includes features such as the number of edges, connected components, and number of singletons in the set of early adopters, achieves an accuracy of 67% in predicting whether a hashtag will double its size.

For more, see our full paper, On the Interplay between Social and Topical Structure.
Daniel M. Romero, Northwestern University
Chenhao Tan, Cornell University
Johan Ugander, Cornell University

This entry was posted in ICWSM 2013 by danielromero. Bookmark the permalink.

About danielromero

I am a postoctoral fellow at Northwestern University Institute on Complex Systems (NICO). My main research interest is the empirical and theoretical analysis of Social and Information Networks. I am particularly interested in the study of network evolution and information diffusion.