Characterizing Happy Communities using Tweets

How can we better understand and measure the well-being of a community?

Polling organizations like Gallup ask questions such as “how satisfied are you with your life?“. Aggregating surveys over large samples of the US population, they can measure community well-being and relate it to other community data such as demographics or socioeconomic status (average age, income, education, etc.).

However, surveys are relatively expensive, and relating community well-being to things like socioeconomic status only gives us limited insight into what makes a thriving community.

We think Twitter can help! Tweeting is pervasive across the US and, unlike responses to surveys, tweets are not constrained to pre-chosen questions.

In a work we’re presenting at International AAAI Conference on Weblogs and Social Media (ICWSM) next week, we analyzed the language in twitter and found tweets reveal a lot about community well-being.

Map of the US showing life satisfaction (LS) of counties as predicted using our our model of socioeconomic factors and Twitter language. Green regions have higher satisfaction, while red have lower.

Map of the US showing life satisfaction (LS) of counties as predicted using our
our model of socioeconomic factors and Twitter language. Green regions have higher satisfaction, while red have lower.

Using a dataset of millions of tweets with geo-location information, we built a model of language that predicts community well-being (as measured by Gallup polls). Our results?

  • Twitter language alone was significantly predictive of well-being. This was the first work to show this.
  • Socioeconomic information (as expected) was more predictive than language alone.
  • However, the combination of twitter language and socioeconomic information was significantly more predictive than socioeconomic information alone.

Thus, tweets are capturing “something” above and beyond standard demographic and socio-economic indicators.

What are tweets capturing about well-being? We didn’t want to just wind up with a happiness score, so the bulk of our work looked into this question by observing the actual words people use in regions with differing levels of life satisfaction. We used a technique known at Latent Dirichlet Allocation (LDA) to group related words into 2000 “topics”. Then, we determined which topics most distinguished those communities with high and low well-being. The results are below.

Top ten topics most positively correlated  with well-being (top), and top two topics negatively correlated with well-being (bottom). word size corresponds to prevalence within the topics.

Top ten topics most positively correlated with well-being (top), and top two topics negatively correlated with well-being (bottom). word size corresponds to prevalence within the topics.

Here, we see the influence of socioeconomic factors on community well-being, and more. For example, the topic about “money” is about philanthropy (“donate”, “charity”, “support”), and the topic about “business” is about development (“learning”,  “skills”, “development”, “education”). We refer to this as greater “behavioral and conceptual resolution” — these results don’t just suggest money and business influences well-being, they suggest it is the donation of money and the development of skills for business that affect well-being.

Furthermore, we see some non-socioeconomic topics related to well-being. For example, topics relating to outdoor activities, spiritual meaning, and exercise were predictive of happy communities while topics of disengagement (“bored” and “tired”) distinguished communities with low well-being. These results support previous research and hypotheses on individual-level well-being. We look forward to a future of community research which leverages social media to capture greater behavioral and conceptual resolution than previously feasible.

For more, see our full paper, Characterizing Geographic Variation in Well-Being using Tweets
H. Andrew Schwartz*, Johannes C. Eichstaedt*, Margaret L. Kern, Lukasz Dziurzynski,, Megha Agrawal, Gregory J. Park, Shrinidhi K. Lakshmikanth, Sneha Jha, Martin E. P. Seligman, and Lyle Ungar,  University of Pennsylvania
Richard E. Lucas, Michigan State University
*co-lead this work

This work was done as part of the World Well-Being Project (

This entry was posted in ICWSM 2013 and tagged , , , by hansens. Bookmark the permalink.

About hansens

Andy is a postdoctoral fellow of Computer & Information Science and lead research scientist for the World Well-Being Project at the University of Pennsylvania. The World Well-Being Project is pioneering techniques for measuring psychological and physical well-being in social media. As a collaboration between computer scientists, psychologists, and statisticians, The World Well-Being Project is a multi-disciplinary research group in the Positive Psychology Center at the University of Pennsylvania. Much of our work is part of an effort to develop an unobtrusive measurement of the psychological and physical well-being of large populations by analyzing their written expressions in social media such as Facebook and Twitter. In the process, we are shedding new light on health and psychosocial processes.

4 thoughts on “Characterizing Happy Communities using Tweets

  1. Interesting. I think, in particular, your assumption the tweets of users in a community can indicate what it’s like to live around them, influencing one’s own life satisfaction is quite interesting and promising as a good research topic.

    Non-content data might provide another means about well-being that could not uncover in study of language expression. Do you have any sense about relationship between life satisfaction (your measurement) and user activities including indicators of interaction or network properties?

    • Thanks Minsu. Yeah, we think it’s interesting that the sample of people on twitter we get language data from are likely going to be different than the random people called for the life satisfaction survey, yet we get such a strong predictor and face-valid correlations from the topics. Essentially, we hypothesize that a community has a psychological state which can be captured by sampling it (through calling random people for surveys or through Tweets). Did you have something in mind? It would be interesting with longitudinal data to try to track how a community influences an individual.

      We’ve started looking a little into the non-content features. I would hypothesize that those with higher life satisfaction would be interacting more with non-celebrities(@replies), since strong links have been found between life satisfaction and having strong personal relationships. People have found links between network statistics and personality (Bachrach Y, Kosinski M, Graepel T, Kohli P, Stillwell D., 2012). We started with language features as they are self-descriptive and can provide unforeseen new “insights” (what individual words or topics are most linked with community life satisfaction).

  2. Yeah, it would be really interesting to analyze longitudinal data for tracking how a community influences an individual. I just thought that this idea and your hypothesis that a community has a psychological state makes me think about the concept “general will”. Sorry, I’m joking. (But your idea or perspective is extremely intriguing. It raises questions that I would like to think about.)

    Some measures of the inter-community and intra-community interactions might show meaningful aspects about life satisfaction as well. For example, it should be interesting to explore an issue like whether individuals in a community with lower life satisfaction would be more likely to interact with those with higher life satisfaction or not. Language features would help to understand important motivations or to gain insights under this setting as well.

  3. Pingback: Twitter Can Tell Whether Your Community Is Happy or Not - The Atlantic Cities - TWITTEROO.NET

Comments are closed.