Dude, srsly?: The Surprisingly Formal Nature of Twitter’s Language

Twitter has always been associated with brevity and immediacy, and there is little wonder that the combination of these two has led to a preconceived notion that linguistic style on Twitter is closer to the “lingo” used on informal mediums such as SMS and IM. However, the purpose of Twitter as a conduit for discussion and news dissemination also puts forward the possibility that its language may really be a length restricted version of the language found on more formal mediums such as magazines and newspapers.

To address this debate, we collected large corpora of data from Twitter and other mediums such as email, SMS, new magazines, and  tried to answer the question:

Is the language of Twitter closer to informal media such as SMS and IM, or does it share similarities with more curated media like newspapers and magazines?

In the first part of our analysis, we considered the following common grammatical elements to quantify linguistic styles of a language:

  • Word frequency and usage (WF)
  • Lexical density (LD)
  • Personal pronoun usage (first, second, third person)
  • Use of intensifiers (that was so cool!)
  • Temporal references (I am going to be there)

In the second part, we devised a novel flexible factorization framework to understand the cognitive and affective aspects of language as it is used in various media by analyzing counts of the words used in each (LIWC). Affect and emotion were analyzed using words related to concepts such as positivity, negativity, anxiety, anger, and sadness, while cognitive aspects were measured by words related to insight, discrepancy, tentativity, certainty, etc.

Summary of Results for Twitter; WF: Word Frequency; LD: Lexical Density; PP: Personal Pronouns; INT: Intensifiers; TR: Temporal References; AA: Affective Aspects; CA: Cognitive Aspects

Summary of Results for Twitter; WF: Word Frequency; LD: Lexical Density; PP: Personal Pronouns; INT:
Intensifiers; TR: Temporal References; AA: Affective Aspects; CA: Cognitive Aspects

The results of these various analyses were at once surprising and affirming — surprising because they overturned many a piece of conventional wisdom with respect to tweets; and affirming because they showcased the reasons for some of the behavior that the data exhibit.

  • The language of Twitter is more conservative, less formal, and much less conversational than SMS and IM; however, it shares the brevity and interactivity of SMS and IM.
  • Twitter users are developing unique styles that set its language apart from other mediums – for example, in the usage of temporal references. The use of temporal references on Twitter is much closer to SMS and IM — thus reaffirming its real-time nature.
  • Twitter has much less variation of affect than traditional media like newspapers, magazines and emails; and it tends to display more positive moods and affect than these other media.

For more, see our full paper, Dude, srsly?: The Surprisingly Formal Nature of Twitter’s Language.

Yuheng Hu, Arizona State University
Kartik Talamadupula, Arizona State University
Subbarao Kambhampati, Arizona State University

4 thoughts on “Dude, srsly?: The Surprisingly Formal Nature of Twitter’s Language

  1. Can you explain a bit more fully what ‘formal’ means in this context?

  2. Thanks for your comment, Michael.

    The definition of “formal” in our paper is that traditional media like newspapers etc. use language that is closer to the written standard, whereas mediums like IM and SMS display more instances of spoken language use — and this is what we mean by formal v. informal respectively.

    Many people think Twitter might be an very informal medium, just like IM and SMS, since they share similar brevity and immediacy. In the old days, people even do .twittering through SMS on their phone. However, our study here showed that this might not be the case..

  3. I think I’d add to what Yuheng was saying about the nature of “formal” in the context of this work – there are ways to measure this, quite apart from amorphous definitions like “used in newspapers” or “prevalent in SMS”.

    The register and genre of text can be measured using established metrics such as the lexical density, and a clear divide is seen between different kinds of medium (thus given an unlabeled sample of text, it should be possible for example to predict what medium that segment came from with a fair degree of confidence).

    To us, this was an interesting way to measure the language of Twitter because the conventional wisdom (not always supported by analysis or data) has been to lump it with other short-form mediums; however, as we show, this isn’t quite the case.

    There are also psycholinguistic measures of the ‘formalness’ of a piece of text or a document, and we consulted with language experts here at Arizona State while performing that part of our analysis to determine if using those measures made sense.

    • Interesting. I wonder if this is driven more by public vs. private communication than by cell phone/txt vs. laptop.

      Could you create a ranked order of how grammatically strict each media tends to be? I’d be curious, for example, how close blogs are to newspapers, or whether tweets are more strict than Facebook updates.

Comments are closed.