Labeling is a commonly proposed strategy for reducing the risks of generative artificial intelligence (AI). This approach involves applying visible content warnings to alert users to the presence of AI-generated media online (e.g., on social media, news sites, or search . . .
Labeling is a commonly proposed strategy for reducing the risks of generative artificial intelligence (AI). This approach involves applying visible content warnings to alert users to the presence of AI-generated media online (e.g., on social media, news sites, or search engines). Although there is little direct evidence regarding the effectiveness of labeling AI-generated media, a large volume of academic work suggests that warning labels can substantially reduce people’s belief in and sharing of content debunked by professional fact-checkers. Thus, there is reason to believe that labeling could help inform members of the public about AI-generated media. In this paper, we provide a framework for helping policymakers, platforms, and practitioners weigh various factors related to the labeling of AI-generated content online. First, we argue that, before developing labeling programs and policies related to generative AI, stakeholders must establish the objective(s) that labeling is intended to accomplish. Here, we distinguish two such goals: (1) communicating to viewers the process by which a given piece of content was created or edited (i.e., with or without using generative AI tools) versus (2) diminishing the likelihood that content misleads or deceives its viewers (a result that does not necessarily depend on whether the content was created using AI). Next, we summarize results from two large-scale experiments demonstrating that labeling can, under certain conditions, meaningfully decrease individuals’ likelihood of believing and engaging with misleading, AI-generated images. Finally, we highlight several important issues and challenges that must be considered when designing, evaluating, and implementing labeling policies and programs, including the need to (1) determine what types of content to label and how to reliably identify this content at scale, (2) consider the inferences viewers will draw about both labeled and unlabeled content, and (3) evaluate the efficacy of labeling approaches across contexts.
1Rapid improvements in the sophistication and accessibility of generative artificial intelligence (AI) have made it easier than ever for creators to produce realistic videos, images, and audio of almost anything imaginable—including events that never took place. Although generative AI has myriad applications, its potential to amplify the production and dissemination of audiovisual misinformation has become a source of widespread concern (Ghaffary 2023; Goldstein et al. 2023; Helmus 2022). Members of the American public by and large report limited experience with generative AI tools (Harris Poll 2023; Jackson et al. 2023), and many say that altered videos and images sow confusion about the facts of current events and issues (Gottfried 2019). Some academic research likewise finds that people often struggle to discern whether AI-generated media are real or fabricated (Farid 2022; Köbis, Doležalová, and Soraperra 2021; Mai et al. 2023), and these issues will undoubtedly worsen as technology continues to improve and evolve (Thompson and Hsu 2023; Vynck 2023).
One proposed response to this challenge involves visibly and transparently labeling AI-generated media (e.g., with text or graphics superimposed on, or displayed next to, the media) to inform users that the content was created using generative AI. This approach has already attracted legislative attention in both the United States and abroad (Bennet 2023; Goujard 2023; Ryan-Mosley and Heikkilä 2023; Torres 2023); for example, the bipartisan AI Labeling Act of 2023 (Schatz 2023) would require developers of generative AI systems to include “clear and conspicuous” disclosures indicating that content was produced using AI.2 At a White House summit in July 2023 (The White House 2023), seven leading technology companies also pledged to develop “robust technical mechanisms” (Bartz and Hu 2023) for communicating to users when content is AI-generated. And early attempts at such labeling are already appearing across platforms (Ghaffary 2023); in recent months, companies including Google (Dunton 2023), Meta (Clegg 2024), and TikTok (TikTok 2023) have begun to explore strategies for alerting users to content that has been created or modified using AI.
This emphasis on labeling AI-generated content should be considered in light of academic work demonstrating the utility of warning labels in addressing misleading and deceptive content online. In particular, past studies find that the addition of fact-checking labels to posts identified as misinformation can reduce people’s likelihood of believing (Clayton et al. 2020; Koch, Frischlich, and Lermer 2023; Martel and Rand 2023; Pennycook et al. 2020; Porter and Wood 2022; Shen, Kasra, and O’Brien 2021) and sharing (Epstein et al. 2022; Martel and Rand 2023; Mena 2020; Pennycook et al. 2020; Yaqub et al. 2020) false and unsubstantiated claims. Labeling AI-generated media online may likewise be able to temper the negative effects of this content, but additional research is needed to understand both when and how such labels should be applied.
In this paper, we therefore lay out a framework aimed at helping policymakers and platforms navigate key questions associated with labeling AI-generated media online. In the first section, we underscore the role of institutional goals and strategy in shaping the design and implementation of a labeling policy or program. In the second section, we then summarize results from two large-scale experiments aimed at evaluating the effect of different types of labels on individuals’ belief in and engagement with AI-generated misinformation. In the final section, we outline several considerations and challenges related to labeling, including (1) identifying the correct subset of content to label, (2) mitigating potential unintended consequences of labeling for broader media trust, and (3) ensuring the efficacy and generalizability of labels across contexts. Throughout, we discuss strategies for labeling audiovisual media (including videos, audio, images, and graphics), as other work has examined the distinct, but related, domain of AI-generated text (Goldstein et al. 2023). Nevertheless, we take a broad view of the field—focusing on a wide variety of settings in which labeling could occur (e.g., on social media, news platforms, or search engines). We thereby highlight ways that labeling may be able to contribute to core policy objectives while also emphasizing opportunities for additional research and analysis on this emerging topic.
Despite widespread interest in the use of labeling to mitigate the influence and spread of AI-generated media online, it remains unclear what term(s) should be applied to this content. AI-generated media are far from monolithic; different types of content vary in their manner and degree of algorithmic intervention as well as their likely consequences for society at large. Moreover, members of the public possess varying levels of familiarity with and knowledge about AI and related technologies (Funk, Tyson, and Kennedy 2023; Harris Poll 2023), raising questions about whether, and under what conditions, specific labels will be comprehensible to a general audience. How policymakers and technology companies resolve these issues should depend greatly on their overarching goals and organizational strategies. In short, the design, evaluation, and implementation of any labeling policy or program must reflect the core objective(s) that this labeling is intended to accomplish.
To this end, we outline two goals that could motivate the labeling of AI-generated media online. The first is a ‘process-based’ goal that focuses exclusively on the technical processes through which a given piece of content was created and/or modified. From this perspective, the primary function of an AI labeling program is to communicate to users how a particular piece of content was produced while remaining agnostic about its potential consequences for viewers or society more generally. To achieve this goal, a labeling process might seek to identify and flag any content that was made or edited using generative AI technology, regardless of its format (e.g., video, audio, or image), domain (e.g., politics, art, or science), or conceivable impact on viewers’ beliefs and behavior.3
The second, in contrast, is an ‘impact-based’ goal centered on the content’s potential harm. Much of the prevailing discourse surrounding the labeling and detection of AI-generated media is grounded in fears that such content could mislead or otherwise deceive members of the public. For example, recent calls to regulate the use of AI in political advertising (Chapman 2023; Klepper 2023) stem from concerns that ‘deepfakes’ could sway voting behavior, alter election outcomes, or incite political violence (Helmus 2022; Klepper and Swenson 2023). These concerns about public deception are not unique to politics; indeed, the proliferation of generative AI may have even more widespread consequences in other areas, including the perpetration of fraud and scams (Flitter and Cowley 2023; Verma 2023) and the creation of nonconsensual sexual imagery (Hunter 2023). Labeling efforts could therefore focus on the extent to which content is likely to mislead—for instance, by integrating tools for identifying AI-generated media with existing systems aimed at detecting misinformation (in which many technology companies have already made substantial investments, e.g., professional fact-checkers, crowd raters, and machine learning algorithms; see Meta 2021; Silverman 2019; X, n.d.).
Examples of online content that vary in both their production process (i.e., AI-generated or not) and their potential risks (i.e., misleading or deceiving viewers) | |||
Production Process | |||
---|---|---|---|
AI-Generated | Not AI-Generated | ||
Potential to Mislead | Misleading |
|
|
Not Misleading |
|
|
Although these two goals are not mutually exclusive, they are clearly distinct. As summarized in Table 1, not all AI-generated media are inherently misleading (e.g., digital art; see Epstein, Hertzmann, et al. 2023), nor are all forms of misleading content produced using AI (Weikmann and Lecheler 2022). For example, so-called ‘cheapfakes’ that present audio or visuals out of context or use conventional editing techniques to misconstrue events may be just as damaging to the information environment as entirely synthetic media constructed from scratch using generative AI (Paris and Donovan 2019; Schick 2020). Moreover, recent work finds that labeling text-based news headlines as AI-generated diminishes viewers’ belief in both true and false headlines (Altay and Gilardi 2023; Longoni et al. 2022). These results suggest that different types of labeling goals may be in tension. Whereas process-based labels, by design, tend to be transparent about media’s provenance, they may be less informative about veracity—and therefore have the potential to create confusion about whether presented information is true or false.
A labeling program’s objectives also directly inform its design and implementation. First, different goals demand fundamentally different framing and conceptualization. Process-based labels that report methods of content creation would likely need to adopt a neutral stance, as a large portion of AI-generated media is not intrinsically malicious or deceptive in nature. Conversely, impact-based labels denoting misleading (or inaccurate/disputed) content might instead carry a distinctly negative connotation given their end goal of reducing belief in and engagement with the labeled content. Calibrating the tone of labels is crucial to ensuring they are interpreted as intended. Following this logic, different goals may also shape the consequences of labeling for a post’s subsequent distribution. On the one hand, process-based labels should likely not affect the algorithmic ranking of content on social media as these labels could be applied to harmless or even beneficial posts (Altay and Gilardi 2023). On the other hand, consistent with existing practice (Meta, n.d.), AI-generated content labeled as false or misleading should perhaps also be downranked so as to minimize its likelihood of reaching a wide audience.
At a more basic level, the wording used to label content should also be precise about what types of content are—and, just as importantly, are not—covered by a specific label. In surveys conducted by several members of our research team (Epstein, Fang, et al. 2023), we asked a large sample of people in the United States, Brazil, India, Mexico, and China to indicate how well nine common labeling terms described twenty types of content that varied both in their production process and their potential to mislead (for examples, see Table 1). Overall, we found that different terms satisfy different aims. On the one hand, members of the public most consistently associated the terms “AI-generated,” “Generated with an AI tool,” and “AI-manipulated” with content that was constructed using AI technology—regardless of whether this content was misleading. As such, if labeling policies and programs seek to transparently communicate the processes by which AI-generated media were created, terms that explicitly reference AI may be most appropriate. On the other hand, terms like “Manipulated” and “Deepfake” were most consistently associated with misleading content, regardless of the methods through which this content was generated, making these terms better suited to labeling strategies aimed at identifying deceptive media.4
Importantly, however, participants in this study reported that both types of labels—that is, both process- and impact-based labels—would make them less confident that the events shown in the presented media took place. Thus, if labels are applied to AI-generated content regardless of how misleading it is, this could reduce belief in content that is accurate but AI-generated—a possibility that is supported by other experiments assessing the causal effect of process-based labels on the perceived accuracy of text-based news headlines (Altay and Gilardi 2023; Longoni et al. 2022). Together, these results highlight the importance of well-defined objectives in shaping how and where generative AI disclosures are implemented.
Altogether, existing research suggests that warning labels can reduce individuals’ likelihood of believing and engaging with false and misleading information—but important questions remain about how such labels might operate in the domain of AI-generated content. To address these questions, we conducted two online experiments (Wittenberg et al. 2024). These experiments in total surveyed a diverse national sample of over 7,500 Americans. In both studies, participants were shown a simulated social media post containing an AI-generated image that journalists or fact-checkers had previously identified as misleading. Across the two studies, we examined a set of 29 images, spanning both political and nonpolitical topics and covering a wide range of subjects (e.g., nature, technology, religion, and pop culture).
In the first study, conducted in October 2023, participants were randomly assigned to one of five labeling conditions. Participants in the first condition viewed a version of their assigned social media post without any label applied (the control group). Participants in the remaining conditions, by contrast, saw a version of this post that contained one of four warning labels, adapted both from prior research (Epstein, Fang, et al. 2023) and from existing policies at social media companies: (1) an “AI-generated” label, indicating that the post contained media that was generated using AI; (2) an “Artificial” label, indicating that the post contained artificial content that was edited or digitally altered; (3) a “Manipulated” label, indicating that the post contained media that was altered, manipulated, or fabricated; and (4) a “False” label, indicating that the post contained media that had been reviewed by independent fact-checkers.
Overall, we found that labeling significantly reduced individuals’ belief in the posts’ core claims and their stated likelihood of sharing the posts online, providing initial evidence that labeling can meaningfully shape perceptions of and responses to AI-generated content. However, we observed considerable variation in the efficacy of different labels. In particular, the “AI-generated” label, which focused solely on the process by which the content was created, tended to have a smaller impact, particularly on engagement intentions, compared to labels that more overtly referenced the content’s potential to mislead (e.g., the “Manipulated” or “False” labels). Moreover, we found that different labels conveyed different types of information; whereas the “AI-generated” label better helped participants understand how the content was made, it offered less insight into whether this content was true or false. Together, these results highlight the importance of tailoring labeling designs to one’s underlying goals. Process-centered labels may be more appropriate for sweeping policies covering all forms of AI-generated media, as these labels inform users about the content’s provenance but have more muted effects on beliefs and behavior. However, if the goal of labeling is to specifically target AI-generated misinformation, such process labels may have diminished utility.
To further investigate these dynamics, we therefore ran a second experiment in December 2023 in which we more systematically varied the structure of the warning labels. Specifically, we manipulated both whether the label contained a process cue—namely, whether it communicated that the content was altered or AI-generated—and whether it contained a veracity cue—namely, whether it conveyed that the content could mislead people. This approach resulted in six experimental conditions: (1) a control group who viewed an unlabeled version of the post; (2) a group who viewed an “AI-generated” label, indicating that the post contained an image that was generated using AI; (3) a group who viewed an “Altered” label, indicating that the post contained an image that was edited or digitally altered; (4) a group who viewed a “Misleading” label, indicating that the post contained an image that could mislead people; (5) a group who viewed a “Misleading – AI-generated” label that combined the AI-generated and misleading labels; and (6) a group who viewed a “Misleading – Altered” label that merged the altered and misleading labels.
In line with our first study, we found in the second study that labeling decreased people’s likelihood of believing and sharing AI-generated misinformation. Furthermore, these patterns seemed to be particularly apparent for labels that relayed information about both how the image was created and the image’s potential to deceive, suggesting that both types of cues may provide signals to viewers about a message’s credibility.
In summary, our research provides preliminary evidence that labeling can shape individuals’ belief in and engagement with AI-generated content online. However, even if labeling policies and programs are developed with a clear set of goals in mind—and language to match—several key challenges remain.
First and foremost is the question of what content to label. Existing efforts to identify misleading content on social media typically rely on professional fact-checkers (Meta 2021) and/or the “wisdom of crowds” (Martel et al. 2023), along with machine learning classifiers. However, as attention has shifted toward generative AI, questions remain about how best to determine whether content has been created or manipulated using these specific technologies. As platforms start to roll out AI labeling programs, many have relied on creators to voluntarily report their use of AI tools (Ghaffary 2023; TikTok 2023). Such self-disclosures are easy to implement and require minimal top–down enforcement by platforms, but they are unlikely to be adopted by the very actors whose use (or abuse) of generative AI tools is expected to be most harmful. Thus, reliance on voluntary self-disclosure will be insufficient for addressing the potential risks associated with AI-generated content.
Computational approaches to detecting AI-generated media (e.g., using machine learning and forensic analysis; see Farid 2022) may be able to circumvent some of these impediments to scalability. These methods identify statistical patterns and artifacts in AI-generated media, allowing for post-hoc detection of media manipulation. Nevertheless, such systems can fail to uncover new forms of AI-generated media designed to avoid detection (Epstein, Hertzmann, et al. 2023) and typically produce probabilistic, rather than definitive, estimates of whether a given piece of content was made using generative AI. These dynamics are potentially problematic, as misclassification of AI-generated media as authentic (or vice versa) could undermine users’ confidence in the legitimacy of labeling efforts and erode trust in the media ecosystem more generally (Freeze et al. 2021). Hybrid strategies that blend machine learning, forensic techniques, and crowdsourcing (Groh et al. 2022) may help improve the robustness of AI detection systems but are still vulnerable to the emergence of new forms of AI-generated content. These automated approaches may therefore require continual fine-tuning and adaptation to ensure they are able to keep pace with advancements in AI technology.
A final set of strategies, in contrast, focuses on more direct disclosure methods. These techniques embed signals about whether content is authentic or AI-generated into the content itself.5 For instance, digital signature-based approaches encode information about the origins of a piece of content, or its ‘provenance,’ via a cryptographically secure chain-of-custody (e.g., following the Coalition for Content Provenance and Authenticity, or C2PA, protocol). Because these digitally signed statements can be generated at the point of creation, they are less prone to manipulation and evasion and can overcome previously mentioned obstacles related to misclassification and self-disclosure (Epstein, Hertzmann, et al. 2023). While this approach comes with coordination challenges related to the development of industry-wide standards and the need for widespread adoption across companies, governments are well-positioned to address these challenges through regulatory mandates.
Second, the effects of labels may extend far beyond the individual pieces of content to which they are applied. Most notably, wide-ranging efforts to draw attention to AI-generated content could predispose the public to question the veracity of authentic content (Citron and Chesney 2019; Hameleers and Marquart 2023). Recent research on AI disclosures finds that people tend to be less trusting of content tagged as AI-generated, regardless of its underlying veracity or provenance (Altay and Gilardi 2023; Longoni et al. 2022). In this same vein, some research suggests that general warnings (as opposed to labels attached to specific pieces of content, e.g., interventions that educate users about ‘deepfakes;’ see Ternovski, Kalla, and Aronow 2022) can make people skeptical of all media they encounter, leading them to erroneously discount real information (Clayton et al. 2020; Hoes et al. 2023). In addition, past research has uncovered an “implied truth effect” (Pennycook et al. 2020), where the application of fact-checking warnings increases the perceived credibility of unlabeled content, even in instances when this content is, in fact, untrustworthy. It is possible that a similar ‘implied authenticity effect’ might occur when labeling AI-generated media, particularly in the absence of an analogous system for identifying and validating content created without AI. When assessing the impacts of a labeling program, it is critical to bear in mind not just how labels influence individuals’ responses to tagged content but also how they affect inferences about unlabeled posts and about the media environment more generally.
Labeling may also have consequences for users’ beliefs and expectations. In particular, conspicuous efforts to apply labels may convey to users that AI-generated media (and/or misinformation) are widespread, thereby inadvertently normalizing the dissemination of this content. In addition, the introduction of a novel warning system may initially capture viewers’ attention and interest, but users may gradually become inured to these labels over time, diminishing their long-term potency (a form of ‘banner blindness;’ see Benway 1998; Burke et al. 2005). Additional research is needed to better understand whether the immediate, direct benefits of labeling in mitigating harm outweigh the long-term, indirect effects of labeling on broader attitudes and behavior.
Finally, different contexts may necessitate different labeling approaches. Although we focus here on the domain of audiovisual media (including video, audio, images, and graphics), AI-generated text is an important—and closely related—phenomenon with its own unique features and challenges (Goldstein et al. 2023; Wittenberg et al. 2021). Furthermore, not all users may interpret labels in the same way. As just one example, in our cross-national study examining the comprehensibility of various labeling phrases (Epstein, Fang, et al. 2023), participants from China interpreted the term ‘artificial’ very differently than participants in other countries, reflecting linguistic differences in the types of behavior this word connotes outside the domain of artificial intelligence.6 Implementing a labeling program at scale requires close attention to these cultural and semantic distinctions, especially in light of the global reach of generative AI and the international user base of many online platforms.
At a time when generative AI systems are increasingly capable of fabricating high-quality media, visible and transparent labeling of AI-generated content offers one potential safeguard against deception and confusion. As policymakers, technology companies, academics, and other actors debate strategies for AI disclosures, it is vital that they be clear about the objectives of such disclosures—which may include a desire to convey the processes through which content was created, a desire to identify misleading content, or some combination of these and/or other goals. These objectives can provide a foundation for determining (1) what types of content to label and (2) how to design labels that are both accurate and credible to a wide audience. When establishing policy guidelines and programmatic strategies, stakeholders should also remain attuned to the consequences of disclosures not just for tagged content but also untagged content given the risk that a fragmented or unreliable labeling system could engender mistrust and further blur the lines between reality and fiction. As artificial intelligence systems continue to evolve at a whirlwind pace, it is imperative for policymakers and platforms to carefully weigh these considerations when regulating, designing, evaluating, and implementing labels for generative AI.