Research shows how statistics can help fight misinformation

An American University (AU) math professor and his team created a statistical model which can be used to detect misinformation in social posts. The model also avoids the problem of black boxes that occur in machine learning.

With the use of algorithms and computer models, machine learning is increasingly playing a role in helping stop the spread of disinformation, but a major challenge for scientists is the black box of unknowability, where researchers don’t understand how the machine comes to the same decision as human trainers.

Using a Twitter dataset with misinformation tweets about COVID-19, Zois Boukouvalas, assistant professor in the AU Department of Mathematics and Statistics, College of Arts and Sciences, shows how statistical models can detect misinformation in social media during events such as a pandemic or natural disaster. In recently published research, Boukouvalas and his colleagues, including AU student Caitlin Moroney and computer science professor Nathalie Japkowicz, also show how the model’s decisions align with those made by humans.

“We would like to know what a machine thinks when it makes decisions, and how and why it agrees with the humans who shaped it,” Boukouvalas said. “We don’t want to block someone’s social media account because the model is making a biased decision. “

The Boukouvalas method is a type of machine learning using statistics. It’s not as popular a field of study as deep learning, the complex, multi-layered type of machine learning and artificial intelligence. Statistical models are effective and offer another, somewhat untapped, way to combat disinformation, Boukouvalas said.

For a test set of 112 real and erroneous tweets, the model achieved high prediction performance and ranked them correctly, with almost 90% accuracy. (Using such a compact dataset was an effective way to verify how the method detected misinformation tweets.)

“What’s important about this discovery is that our model achieved precision while providing transparency on how it detected tweets that were disinformation,” Boukouvalas added. “Deep learning methods cannot achieve this kind of precision transparently. “

Before testing the model on the dataset, the researchers first prepared to train the model. Models are as good as the information humans provide. Human biases are introduced (one of the reasons behind the biases in facial recognition technology) and black boxes are created.

The researchers carefully tagged the tweets as either disinformation or real, and they used a set of predefined rules about the language used in disinformation to guide their choices. They also took into account nuances of human language and linguistic characteristics associated with misinformation, such as a message that makes more use of proper names, punctuation, and special characters. A sociolinguist, Professor Christine Mallinson of the University of Maryland in Baltimore County, identified the tweets for writing styles associated with misinformation, prejudice and less reliable sources in the media. Then it was time to train the model.

“Once we add these inputs into the model, it tries to understand the underlying factors that lead to the separation of good and bad information,” Japkowicz said. “It’s about learning the context and how the words interact. “

For example, two of the tweets in the dataset together contain “bat soup” and “covid”. The tweets were labeled as disinformation by the researchers and the model identified them as such. The model identified the tweets as having hate speech, hyperbolic language, and strongly emotional language, all associated with misinformation. This suggests that the model singled out the human decision behind the tagging in each of these tweets, and followed the researchers’ rules.

The next steps are to improve the model’s user interface, as well as improve the model so that it can detect misinformation social posts that include images or other media. The statistical model will need to learn how a variety of elements in social posts interact to create disinformation. In its current form, the model could be best used by social scientists or others who are looking for ways to detect disinformation.

Despite advances in machine learning to help fight disinformation, Boukouvalas and Japkowicz agreed that human intelligence and information literacy remain the first line of defense in stopping the spread of disinformation.

“Through our work, we are designing tools based on machine learning to alert and educate the public to eliminate disinformation, but we strongly believe that humans must take an active role so as not to spread disinformation in the first place. “said Boukouvalas.

– This press release was originally posted on the American University website

Comments are closed.