The spread of misinformation on social media platforms threatens democratic processes, contributes to massive economic losses, and endangers public health. Many efforts to address misinformation focus on a knowledge deficit model and propose interventions for improving . . .
The spread of misinformation on social media platforms threatens democratic processes, contributes to massive economic losses, and endangers public health. Many efforts to address misinformation focus on a knowledge deficit model and propose interventions for improving users’ critical thinking through access to facts. Such efforts are often hampered by challenges with scalability, and by platform users’ confirmation bias. The emergence of generative AI presents promising opportunities for countering misinformation at scale across ideological barriers. In this paper, we present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions generated by large language models (LLMs), (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users with the goal of alleviating confirmation bias, and (3) an analysis of potential harms posed by personalized generative AI when exploited for automated creation of disinformation. Our findings confirm that LLM-based interventions are highly effective at correcting user behavior (improving overall user accuracy at reliability labeling up to 47.6%). Furthermore, we find that users favor more personalized interventions when making decisions about news reliability.
In the last decade, there has been growing concern about the proliferation of misinformation on social media platforms. For example, between 2006 and 2017, sensational ‘fake news’ articles spread rapidly on Facebook, diffusing farther and faster than truthful or reputable content (Vosoughi et al. 2018). These sharing trends have been amplified by ‘filter bubble’ algorithms that intentionally create ideological echo chambers, which reinforce existing viewpoints and further facilitate spread of misinformation (Levy 2021; (Acemoglu, Ozdaglar, & Siderius, 2021).
Many proposed interventions for combating misinformation focus on tagging unreliable content (Clayton et al. 2020; Pennycook et al. 2019) or encouraging critical thinking by users (Lutzke et al. 2019; Pennycook et al. 2021b). However, the two major bottlenecks in many such interventions are scalability and confirmation bias. Tagging unreliable or suspicious content requires careful inspection, which is currently performed by professional fact-checking organizations such as Snopes. These organizations are constrained both in terms of their financial resources and qualified fact-checkers they can employ. Furthermore, these interventions rely on the assumption that users are rational agents, who will agree upon a common ‘ground truth’ once exposed to enough information. This assumption is often violated in the real world due to confirmation bias: users do not process new information neutrally, and are more critical of counter-partisan news and more accepting of pro-partisan news at face value (Lord et al. 1979; Nickerson 1998; Tappin et al. 2020).
Breakneck advances in LLMs offer a promising avenue for large-scale fact-checking, as they provide tools for fast information processing and can detect patterns associated with misleading content (Chen and Shu 2023b). Early evidence suggests that LLM-based explanations of veracity can significantly reduce social media users’ tendency to accept false claims (Hsu et al. 2023). LLMs also present a potential path to understanding and countering confirmation bias; recent work (Andreas 2022; McIlroy-Young et al. 2022; Gabriel et al. 2022) argues that LLMs are capable of very simple forms of world and cognitive modeling. This opens up the possibility of tailored approaches to countering misinformation that target users across diverse backgrounds (e.g., varying education levels or ideological leanings).
Our agenda is to develop powerful, automated tools via LLMs that produce tailored, personalized interventions that explain a ground truth veracity label to users ((ref?)). To do this, we have three study phases focused on examining effects of interventions with and without personalization. We first conduct an A/B testing experiment using a simulated social media environment that features both true (reliable) content and false content (misinformation and disinformation). The findings show that interventions that explain the veracity label improve over label-only indicators (increasing accuracy by at least 34.9% vs. 33.3% for label-only indicators). Explanations generated by GPT-4 (OpenAI et al. 2023) perform best in encouraging user flagging of misinformation (pre-intervention users correctly flag in 5.4% of cases vs. 32.9% post-intervention users) while discouraging sharing of false content. This indicates the promise of LLM-based explanations for future intervention strategies, corroborating findings from recent and concurrent work (Gabriel et al. 2022; Hsu et al. 2023).
Our second experiment explores how personalization of explanations can further improve their effectiveness. We measure how the degree of personalization affects user-reported helpfulness, and find that explanations that are highly aligned with users’ attributes (e.g., education, political ideology, gender) are deemed more helpful by users than explanations without personalization (average helpfulness score of 2.98 vs. 2.71, out of 4).
Finally, we study a contrasting use case where GPT-4 can be used with malicious intent to generate personalized disinformation in a more efficient and appealing manner than ever before. Our early findings indicate that personalized disinformation generated by GPT-4 becomes harder to identify as false when viewed by users who are well-aligned with their intended audience. This suggests a potential danger of LLMs in their abilities to generate harmful content that attacks specific groups or even individuals and the need for safeguards to prevent such uses.
To the best of our knowledge, this is the first study to consider tailored LLM-based interventions based on specific attributes of social media users. We view it as a first step in this agenda, since general advances in foundation models will increase these tools’ capabilities for combating misinformation. We envisage better fine-tuning of explanations as additional information becomes available about users (without violating their privacy). Code and anonymized data from this study will be made publicly available at https://github.com/skgabriel/genai_misinfo.
In this section, we provide background definitions for misinformation or misleading content. We then place our work in the context of prior literature on mitigating misinformation. We also provide background on LLMs.
By misinformation, we refer to any content that is objectively false or misleading according to fact-checking sources (e.g., Snopes and Poynter). All reliability labels in this work were sourced from Pennycook et al. (2021a). In contrast to disinformation (which is known by the author to be false), misinformation may consist of either intentionally and unintentionally false content.
Our work aligns with a growing body of literature, mostly from political science, that examines the effectiveness of mitigating misinformation through user-facing interventions. Assuming a known ground truth label determined by human or AI fact-checkers, this literature aims to reduce user consumption and interaction with false content by designing effective ways to present this information or to otherwise nudge users to consider them.
Fact-checking labels that are specifically attributed to AI have been shown to be effective as interventions in reducing user consumption of misinformation (Kyza et al. 2021), though earlier studies found they are often less effective than labels attributed to other sources such as professional fact-checkers (Seo et al. 2019; Yaqub et al. 2020; Liu 2021; Zhang et al. 2021). There is evidence that explaining the mechanics behind how the fact-checking label is generated improves their effectiveness (Epstein et al. 2022). Concurrently with our work, it has been found that GPT-based explanations of content veracity can significantly reduce social media users’ reported tendency to accept false claims (Hsu et al. 2023), though they can be equally effective when used with malicious intent to generate deceptive explanations (Danry et al. 2022). There have also been some early works that explore the use of personalization in AI fact-checking systems, such as Jahanbakhsh et al. (2023), which examines the effects of a personalized AI prediction tool based on the user’s own assessments, and Jhaver et al. (2023), focusing on toxicity in personalized content moderation tools. Our work departs from these as we consider the generation of arguments and justifications given a label, rather than predicting the veracity.
We use autoregressive generative LLMs like GPT-4 (OpenAI et al. 2023) in designing our user-facing interventions. Given a sequence of tokens that represent a sentence, these models output a probability distribution for the next token. Initially, LLMs are pretrained for next token prediction using large corpora of web data. During this pretraining stage, LLMs learn latent concepts, which allows them to generalize to previously unseen tasks only with textual prompting, a phenomenon known as “in-context learning” (Xie et al. 2021).
We recruit human participants to interact with a simulated news feed interface that mimics real-world social media platforms such as Facebook and X (formerly Twitter). The news feed consists of news headlines, or claims, and an intervention (“Find out more”) that the user may voluntarily click on. Upon doing so, they may see a prompt with a veracity label of the headline (true or false) and an explanation of the label. Users may react to the news item as they normally would on social media, and provide feedback on the prompt. By varying the types of interventions presented to users and comparing their subsequent behavior, we can analyze the impacts of various interventions. Details of the interface are given in section 3.1. Phase I ( section 4) compares the effectiveness of five non-personalized explanations, while phase II ( section 5) directly compares GPT-4 generated explanations with and without personalization. In both phases, the full experiment consists of four components: (1) a consent form, (2) task instructions, (3) a questionnaire on user demographics and opinions, and (4) a simulated newsfeed. We present data on the study participants and demographic results from the questionnaire in section 3.4.
Each participant receives five news items, randomly sampled from a dataset of 461 political news headlines collected by Pennycook et al. (2021a). Our experiment uses 188 true articles and 185 false articles.1 Each news item consists of the headline (which we also call a claim), the accompanying image, and the source of the news article. Users can interact with the posts by liking, sharing, or flagging them (shown in the bottom-left of Figure 2). Each user is instructed to perform at least one of these interactions for at least three out of five news items. Users also have the option to click on a “Find out more” button, which displays a pop-up that we call an intervention. Except for the control setting, the intervention consists of two pieces of information: a label indicating whether the claim is true or false and an explanation either supporting or refuting the claim based on the label. In the pop-up, users can rate their perceived helpfulness of this information on a 4-point Likert scale (very helpful, somewhat helpful, somewhat unhelpful, very unhelpful). They can also indicate whether they believe the claim is true, false, or are uncertain.2
In phase I, we consider five types of previously proposed interventions for misinformation mitigation in this experiment. Each participant is randomly assigned to one of the five types of interventions with equal probability and only shown that intervention. In phase II, we introduce a sixth intervention type: personalized GPT-4 interventions. Table 1 lists the six types of interventions, with examples based on the false claim from (ref?). We explain how user attributes are inferred in the next section.
Types of intervention methods used in this experiment | ||
Intervention | Description | Example |
Label only | A simple ground-truth label indicator. | This claim is true/false. |
Methodology (AI) | Following from Epstein et al. (2022), we show users a generic explanation which states AI to be the source of the claim veracity label. | This claim was verified/refuted by an AI model trained on a large-scale corpus of web data. |
Methodology (human) | Same as above, except the source is stated to be fact-checkers. | This claim was verified/refuted by non-partisan fact-checkers. |
Reaction frame explanation | Following from Gabriel et al. (2022), we show users a templated explanation constructed using GPT-2 predictions (Radford et al. 2019) for the intent of the claim of the author3 and potential actions a reader may take in response to a claim. This work proposed pragmatic frames for cognitive modeling of reader reactions to news headlines. | This claim is true/false. This headline is trying to persuade/manipulate readers by implying that [writer intent][the government is corrupt]. It is compelling readers to [reader action][want to find out more]. |
Zero-shot GPT-4 explanation | We instruct GPT-4 to generate an explanation for why a claim is true/false without user-specific information.4 The prompt given to GPT-4 is “Write a short explanation for why the headline ‘[claim]’ is ‘[label].’ Do not mention that you are AI. The explanation must be less than 100 words.” | The headline is false because Dr. Anthony Fauci has not been arrested by any Special Forces. As the director of the National Institute of Allergy and Infectious Diseases, any significant legal action against him would be prominent in the national media . . . |
Personalized GPT-4 explanation | We do the same as above, but with the following prompt that incorporates inferred user attributes “Write a short explanation for why the headline ‘[claim][Special Forces Arrest Deep State Dr. Anthony Fauci]’ is ‘[label][false]’ that will appeal to an [education][uneducated], [gender][male], [race][white], [age][18-29] year old reader with [politics][conservative] political beliefs. Do not mention that you are AI. Do not mention the type of reader. The explanation must be less than 100 words.” | This headline is false because there’s no evidence. Terms like ‘Deep State’ often get thrown around without proof. Remember, just because someone disagrees with Dr. Fauci’s advice, doesn’t mean he’s part of a conspiracy. Don’t believe everything you read on the internet, mate. Always consider the source. |
In phase II, we compare nonpersonalized and personalized GPT-4 interventions (the last two rows of Table 1). Personalized interventions tailor to a specific demographic group based on a set of attributes (gender, race, age, education level, political affiliation). We infer user attributes by using the questionnaire in component (3) of the experiment to ask each user a list of Pew Research American Trends Panel5 survey questions from Santurkar et al. (2023) on social and political issues in the United States. We then compute the conditional probability of a person with a set of demographic attribute values6 giving the same answers as the user. We choose the demographic group with the highest probability. The questionnaire in component (3) of the survey also asks for the actual demographic attributes of the user for validation.7 Since they may not match the values of inferred attributes used to generate the explanation, we compute the personalization alignment score for each user,
In this section, we explain our methodology for user recruitment and qualification tasks we require users to undergo in order to ensure quality of results (e.g., filtering spammers).
We used the Amazon Mechanical Turk8 crowdsourcing platform to recruit 4,173 workers from the United States with at least a 98% Human Intelligence Task approval rating as potential study participants. To filter spamming workers, we ask them two “attention checks” questions that require them to write out the minimum number of posts they must interact with (three) and the number of posts in the newsfeed (five). Any workers who fail either of these attention checks are disqualified from participating in the rest of the study. We also disqualify workers who fail to follow the instructions by interacting with less than three posts. 1,362 workers passed the qualification tests.
In component (3) of the study, we survey qualified workers on age, education level, gender, religion, political affiliation on a left-center-right US political spectrum, and race. Figure 3 shows the distribution of these responses.9
In this section, we describe the results of our first experiment, which subjects participants to five different interventions without personalization. We first look at user behavior in terms of liking, sharing, and flagging, and consider how they differ by the veracity of the news headline and political agreement between the news and the user. We also compare the effectiveness of interventions in changing beliefs and behavior—measured with user accuracy at discerning false content, sharing, and flagging of misinformation—and the user-reported helpfulness score.
We first analyze how users interact with the news items (liking, sharing, and flagging) prior to interventions. We compare their behaviors when viewing accurate news versus misinformation. Moreover, we compare the behaviors when viewing propartisan versus counterpartisan news, which measures the effects of confirmation bias. To measure ideological agreement between the user and the news, we predict political leanings of news headlines using the political bias classifier from Baly et al. (2020), and compare them to the political affiliations of users from their self-reported responses. Overall, the model predicts twenty-two left-leaning headlines, fifty-eight moderate headlines and seventy right-leaning ones.
Consistently, we see that the agreement between the headline’s political leaning and the user’s affiliations has significant effects on all three types of user behaviors (sharing, liking, and flagging), which indicates confirmation bias (Figure 4). When viewing true claims, users are more likely to incorrectly flag them, and if claims disagree with the users’ affiliations, users are less likely to like and share them. When viewing false claims, confirmation bias similarly occurs for liking and sharing, but not for flagging (claims that agree with the user’s affiliation are flagged more). Such behaviors differ across users with different political affiliations. Conservative users consistently exhibit signs of confirmation bias—they are more likely to share and like ideologically confirming headlines regardless of veracity (e.g., sharing right-leaning false content in 14.29% of observations vs. only 2.9% of observations with non–right-leaning false content). However, they may be less prone to ignore bias in flagging (e.g., flagging false content with right-leaning bias in 2.6% of observations vs. 2.9% of observations with non–right-leaning false content). This supports prior work that confirmation bias is an influential factor in user behavior when there is polarization.
We measure the effectiveness of all five (nonpersonalized) interventions in mitigating misinformation by comparing users’ pre- and post-intervention behavior. In particular, we compare the following metrics: (1) accuracy of users’ veracity prediction for each headline; (2) interaction with false headlines, such as sharing and flagging; and (3) user-reported helpfulness of the label and explanation. Table 2 shows results for all nonpersonalized intervention variations. We find that without interventions, users consistently struggle to identify the true accuracy of news. All intervention types significantly improve users’ overall accuracy (up to 47.6%). Also, all explanation-based interventions have a greater effect on accuracy than label-only interventions.
Perceived veracity accuracy, interactions, and helpfulness results for all intervention types, both before interventions (left column) and after interventions (right column) | |||||||
Intervention | Accuracy (% correct) | False content sharing (%) | False content flagging (%) | Helpfulness (% helpful or very helpful) | |||
Before | After | Before | After | Before | After | ||
Label only | 55.15 | 88.48 | 6.41 | 16.22 | 4.23 | 31.08 | 74.49 |
Reaction frame10 | 54.99 | 89.93 | 1.00 | 0 | 1.50 | 0 | 83.01 |
GPT-4 | 51.73 | 98.19 | 4.90 | 0.54 | 5.46 | 32.97 | 96.42 |
Methodology (AI) | 51.87 | 99.47 | 11.10 | 8.92 | 4.04 | 20.00 | 97.91 |
Methodology (human) | 52.14 | 94.56 | 2.40 | 4.05 | 2.52 | 11.89 | 81.18 |
Interestingly, we find that effects on interaction behavior vary considerably across tested intervention types. In particular, the “label only” and “methodology (human)” interventions actually increase sharing of false news. One hypothesis for this may be users wanting to fact-check claims with others they trust, since we also see an increase in false content flagging. The GPT-4 explanation often involves a self-contained fact-check, which may reduce users’ interest in reaching out to trusted networks.
Overall, two interventions seem the most effective: nonpersonalized GPT-4 explanations, and surprisingly, a simple methodological explanation that the veracity label was generated using an AI model, without mentioning specifics of the claim. Both are similar in improvements on users’ judgment of headline veracity and self-reported helpfulness scores. GPT-4 explanations also see the most positive impacts on user interactions with false content, increasing accurate flagging and reducing sharing. These results indicate the promise of explanations for reducing misinformation spread. However, it should be noted that users’ trust in machine-generated explanations are only beneficial if the model is accurate at label prediction.
We now analyze the effects of personalized explanations in our second experiment, phase II, where we directly compare such explanations with nonpersonalized ones generated by GPT-4. First, we compare user-reported scores of how helpful the interventions are with and without personalization, and relate them to the accuracy of inferring user identities. We also analyze the effects of personalization on the length, readability, and formality of GPT-4 responses.
We measure the effectiveness of personalized interventions using user-reported helpfulness scores for personalized explanations, and compare them to scores for nonpersonalized GPT-4 explanations. We also consider the relationship between helpfulness scores and the personalization alignment score (section 3.3).
Figure 5 shows mean helpfulness scores based on 6,520 observations of GPT-4 explanations without personalization and 3,000 observations with personalization. We consider an explanation aligned with the user if its degree of personalization is at least
Overall, users find explanations of veracity more helpful when they appeal to their own demographic group. As seen in Figure 5, personalized interventions that are also aligned are given a higher mean helpfulness score (
In this section, we compare the average length, readability, and formality of personalized explanations of the ground-truth label for six demographic groups with different demographic attributes. In addition, we compare them to GPT-4 generated explanations with no personalization, denoted gcontrol. The first group we consider, denoted as g1, has the following demographic attributes: political affiliation = conservative, race = white, education = uneducated, gender = male, age = 30-49. We then consider personalized explanations for additional groups that differ from g1 by exactly one attribute. Specifically, for g2 political affiliation = liberal, for g3 race = black, for g4 education = educated, for g5 gender = female, and for g6 age = 65+. For each of these demographics, we generate personalized explanations for the ground-truth label using prompts described in section 3.3. We then measure differences between explanations across groups, using length, formality prediction (Pavlick and Tetreault 2016), and reading difficulty based on the Flesch–Kincaid grade level metric (Flesch 1948).
From Table 3, we can see that lengths of explanations are relatively consistent across personalization settings. Political affiliation has the least effect across attributes, while readability and formality are significantly impacted by race, age, education, and gender. In particular, specifying that the user is “educated” greatly reduces readability, indicating use of more challenging language, and increases formality by 18.46%. Specifying that the user is “Black” leads to the least formal language usage.
Comparison of generic GPT-4 and personalized explanations across various demographic groups using automatic metrics | ||||
Group | Varied attribute | Average length (words) | Average readability ↑ | Average formality ↑ |
gcontrol | No personalization | 52.59* | 40.67* | 92.63* |
g1 | Default prompt (before variation) | 58.42 | 55.95 | 78.02 |
g2 | Liberal | 58.45 | 55.99 | 77.84 |
g3 | Black | 58.34 | 59.25* | 71.42* |
g4 | Educated | 63.23* | 38.37* | 96.48* |
g5 | Female | 58.62 | 51.56* | 87.81* |
g6 | Age 65+ | 55.98* | 55.04 | 81.67* |
Higher scores indicate greater readability or formality, respectively. Statistically significant differences between g1 and gk are marked by *. |
Sections 4 and 5 show that GPT-4 is effective at combating misinformation by generating personalized prompts to explain the veracity of news articles. Conversely, we now examine a more concerning potential application of GPT-4, where they may be used with malicious intents to generate personalized disinformation. We conduct another experiment in which we instruct GPT-4 to create news headlines that promote common conspiracy theories, and then survey the ability of human participants to identify them as false. In particular, we study the effects of personalizing such disinformation to target specific demographic groups. Our experiment contributes to a growing literature on misinformation generation using deep learning models (Zellers et al. 2019; Zhou et al. 2023; Chen and Shu 2023a). To the best of our knowledge, we are the first to examine personalization in such processes.
We instructed GPT-4 to generate disinformation headlines around well-known and commonly spread conspiracy theories in Table 4.12
Conspiracy theories used in GPT-4 personalized disinformation study | |
Conspiracy | Description |
“White Genocide” | A white supremacist conspiracy theory alleging a plot to systematically oppress white people. |
“Flat Earth” | A conspiracy theory alleging the Earth is actually flat, with followers known as “flat-Earthers.” |
“Manmade HIV” or “Manmade Covid-19” | Discredited claims about the viruses being fabricated as part of a government plot. |
“Vaccination and Autism Link” | This conspiracy theory perpetuates discredited claims that there is a link between vaccination and development of Autism in children. |
“False Flags” | This conspiracy theory claims that recent mass shootings (e.g., Sandy Hook) were staged. |
The generation process is personalized and aims to appeal to groups with specific demographic attributes, using a prompt similar to that of the explanation intervention in section 3.3. To present these disinformation headlines to the user in a similar format as Figure 2, we pair each headline with an image from a real-world news article. We achieve this by using the Clip model (Radford et al. 2021) to find the most relevant image in a set of news images from the GoodNews dataset (Tan et al. 2020) for each headline.
When we present personalized GPT-4 disinformation to Amazon Mechanical Turk workers along with real news and human-written misinformation, we find no significant difference between their deceptiveness and those of other false content. Workers achieve 82.32% accuracy at labeling real news, 32.30% accuracy at labeling human-written misinformation, and 35.84% accuracy at labeling the GPT-4 disinformation. However, we do find that worker labeling accuracy of personalized disinformation is correlated using Pearson’s R with alignment scores (
Our findings show a promising direction for social media platforms and policy makers to combat misinformation by improving the presentation of content to users. With the ability of LLMs to efficiently generate explanations to support veracity judgments with user personalization, they have the potential to serve as key components in designing scalable and powerful interventions. However, it is important to highlight that their success depends on model accuracy at predicting label veracity. This requires further improvements in automated prediction tools, greater coordination with human fact-checkers, or both. Also, our preliminary observations in phase III raise the concern of future uses of GPT-4 to create targeted fake news campaigns against certain groups or even individuals. The personalization ability of LLMs is a double-edged sword, and collaboration between policy makers, researchers, and engineers is needed to ensure they are used for ethical and desirable intentions.
We thank David Rand for providing the data used in the study and Daniel Huttenlocher for thought-provoking discussions. We also thank Julian Michael, Claudia Shi, Hannah Rose Kirk, Betty Hou, Jason Phang, and Salsabila Mahdi for providing feedback on an early version of the study design, as well as the Dartmouth data analysis team (specifically Rong Guo) for help with summarizing experimental results. Thank you to the MIT Generative AI Award committee for reviewing the proposal.