UVM Mathematicians Lead Study That Shows Language Is Happy, Mostly
It may seem as though social media sites like Twitter are full of griping and negativity, but it turns out this may not be entirely true.
A group of researchers led by Peter Dodds and Chris Danforth, mathematicians at UVM's Computational Story Lab, collected words from 24 types of sources, including books, newspaper articles, social media, television subtitles and music lyrics. From the billions of words they collected, the researchers then asked people to rank the most frequently used 10,000 words from positive or negative on a nine-point scale.
The outcome? Their paper “Human Language Reveals a Universal Positivity Bias” shows that people tend to use more happy words than negative words.
They started out by trying to measure happiness or sadness from large texts. “Whether it be Twitter, or the news, or any kind of large text," says Dodds. "And so that’s what we’ve been working on for 9 years ... Along the way, we figured out that we would need to measure individual words for their happiness levels."
The study contains 10 different languages, including Arabic, Chinese, Korean, English, Spanish, Brazilian Portuguese and Russian. “As all of those languages came back from our surveys, we saw the same pattern. We saw that there were more happy words in all of them,” Dodds says.
“People, for various reasons, may tend to talk about more positive things, may remember things that are more positive just because it’s good,” says Dodds. “You want to keep existing and have a life that keeps going – so we have a bit of a bias in that way and what we’ve seen now is that it’s embedded in our language.”
But Dodds says what is surprising is the positive trend despite the negative things going on in the world. “The news is full of bad things and there are terrible threads in social media here and there, but if you step back from that and look at language as a whole, [you see the positive trend].”
"The news is full of bad things and there are terrible threads in social media here and there, but if you step back from that and look at language as a whole, [you see the positive trend]." - Peter Dodds, UVM mathematician
In examining language in Twitter, literature, movies or the news, the mathematicians say they found the positive bias over and over again.
All of the data they collected throughout the study is online, including a daily updated map of the United States, which shows the happiest states in the country, the happiness of words in the scripts of 1,000 movies, 10,000 books and the New York Times by section.
So who decides whether a word is positive or not? “We asked people on the web through a service that Amazon provides. They saw a scale from one to nine, a sad face to a smiley face, and they were supposed to click the face that represented how they felt when they saw the word,” Danforth explains. He says by the end of the study, they received 5 million evaluations of words worldwide.
While words like sunshine and laughter received universally positive scores, others had mixed results. “Church is one of those words,” says Danforth.
“Profanities, alcohol, vices, cigarettes and so on,” says Dodds, citing examples of words that depend heavily on context. “We tend to take those words out."
Dodds compares the way their instrument collects word information to measuring temperature. “We wouldn’t want to measure the temperature in a tiny little patch of air, we want to measure it for Burlington,” he says. The instrument measures entire books at one time, collecting all of the words, which Dodds says helps account for things like sarcasm. “As far as sarcasm goes, it’s incredibly difficult to get at but it’s hard to have a whole book that’s sarcastic … So what we’ve seen is when we look at phrases that words are in, it works pretty well,” he says.
The happiest words they found universally were "laughter" and "happy." Danforth says they collect about 50 million messages from Twitter per day. “That’s about a trillion words in the past five to six years we’ve been doing this … Various versions of ‘hahaha’ end up in the top 5,000 most frequently used words on Twitter. In every language,” he says.
“The least biased word set that we looked at was Chinese literature,” Dodds says. “And it was seven [happy words] to three [negative words]. So even in the least biased one, it’s still pretty strong.”
“Russian Twitter was kind of in the middle of the pack,” Dodds says. “Russian literature was near the bottom. So it was spread out. We found that the Spanish ones, rated by people in Mexico, they were all at the top in terms of their median level of happiness.”
Vermont ranks high, it turns out, and is currently the happiest state in the country over the past 30 days. “People are using the words, ‘amazing,’ ‘family’ and ‘snow’ a lot,” says Danforth. “There’s a lot less profanity coming out of Vermont.”
“So a bit happier, but not as expressive. So I think that’s not a bad New England kind of story,” adds Dodds.