This paper provides a roadmap for leveraging Generative AI to address known challenges to research integrity, focusing on potential innovations and interventions in peer review, open data sharing, accessibility, and inclusion.
Generative AI (GenAI) is disrupting the traditional ways we maintain and signal trustworthiness and integrity of science. In this paper, we review the emerging and potential roles of GenAI in science policy and as part of the scientific information infrastructure. We then identify a core set of research questions to enable the use of GenAI with scientific integrity to advance open, equitable, and trustworthy scholarship.
Keywords: generative AI; open scholarship; trust; equity, inclusion
Conflict of Interest
The authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.
Author Contributions
We describe contributions to the paper using a standard taxonomy (Allen et al. 2014). All authors take equal responsibility for the article in its current form. CB was primarily responsible for funding acquisition and supervision of the writing. CB, SK, HS, and ES were responsible for funding acquisition and initial conceptualization. CB and MA were primarily responsible for the organization and first draft of the manuscript, and ES and MA lead revision. All authors contributed to review and revision. All authors contributed to the conception of the article (including core ideas, analytical framework, and statement of research questions). All authors contributed to the project administration and to the writing process through direct writing, critical review, and commentary.
Funding
We thank MIT for funding the writing of this paper as part of its “Impact Papers on Generative AI” initiative and the Institute for Museum and Library Services for funding research contributing to conceptualization.
Generative AI (GenAI) is affecting scholarly research in nearly every field. It has the potential to revolutionize the discovery of new knowledge and rapidly advance the progress of integrity-driven research as well as the potential for harm. It has shown promise as a tool for creating vaccines or drugs for newly emerged pathogens (Yim et al. 2023) as well as the potential to facilitate the production of bioweapons (Rubinic et al. 2023). Further, GenAI is already disrupting the traditional ways we maintain and signal the trustworthiness and integrity of research.
A recent Nature survey of 1,600 researchers indicates that scholars see both potential benefits and risks arising from the use of AI in science (Van Noorden and Perkel 2023). Much of the attention in academia thus far has been on mitigating the risks that GenAI poses for research, teaching, and learning (see, for example, Gao et al. 2022; Gaumann and Veale 2023; Gravel, D'Amours-Gravel, and Osmanlliu 2023). Scholarly organizations and funders have issued a range of policy documents and guidelines constraining its use in scholarly publication processes (see, for example, Creators' Rights Alliance 2023; Australian Research Council 2023; Miao and Holmes 2023; National Institutes of Health 2023; Partnership on AI 2023; Flanagin, Kendall-Taylor, and Bibbins-Domingo 2023; Lo 2023). In addition, many leaders in academic libraries are focused on providing guidance to students and faculty on the use and citation of GenAI1 and/or on identifying areas of library work that might be made more accurate and efficient via the deployment of AI tools (see, for example, Hosseini and Holmes 2023; Khan et al. 2023).
If focused on ethics, information integrity, and the value of science to society, GenAI has enormous potential as a tool for increasing the overall integrity of research. Currently, however, the organizations driving GenAI, the techniques for implementing it, and the outputs generated by it do not provide the integrity2 needed for science and scholarship (see section 5) and thus are not a firm foundation for trust amongst academia, government leaders, or the general public (Mitre 2023; Gillespie et al. 2023). This presents a central obstacle to realizing the potential for GenAI to improve science.
We focus on the potential of GenAI to address known problems for the alignment of science practice and its underlying core values. As institutions culturally charged with the curation and preservation of the world’s knowledge and cultural heritage, libraries are deeply invested in promoting a durable, trustworthy, and sustainable scholarly knowledge commons. With public trust in academia and in research waning (Kennedy, Tyson, and Funk 2022) and in the face of recent high-profile instances of research misconduct (Oransky and Marcus 2023), the scholarly community must act swiftly to develop policies, frameworks, and tools for leveraging the power of GenAI in ways that enhance, rather than erode, the trustworthiness of scientific communications, the breadth of scientific impact, and the public’s trust in science, academia, and research.
In this impact paper, we characterize specific applications of GenAI that have potential for improving science. We then identify key research questions necessary to successfully apply GenAI in these areas that promote scientific and societal values. By highlighting several current challenges3 to research integrity, defined broadly, in which GenAI has the largest potential for positive impact, we aim to advance a strategic agenda for research, policy, guidance, and norms for leveraging GenAI to address the trustworthiness of science.
The core of the academic enterprise is the process of building on research and insights of past scientists, applying honest and transparent methods to evidence, and accurately reporting findings. The value of scientific communication to both scientists and society relies inherently on the processes used to produce such communications. For example, the practices of scholarly citation serve a critical role in the development of theory and methods over time, in establishing a chain of evidence for scientific claims, and in providing scholars appropriate credit for their work and influence in their field.
To make timely and evidence-based decisions, scientists and nonscientists alike need to understand how an emerging scientific claim has been vetted. For more than fifty years, the institution of voluntary peer review has played a central role in supporting the transparency and reliability of published claims (Rennie 1990; Moxham and Fyfe 2016) by interjecting a check by disinterested experts between authors and readers of scholarly research.
There is mounting evidence from meta-scientific research on the practices of peer review itself that demonstrates the limitations of current practices. Current practices exhibit significant gaps in reliability across reviewers and biases toward accepted ideas, established senior authors, and writing in English (Lee et al. 2013).
The functioning of peer review as an institution depends on a specialized resource: the voluntary effort of qualified reviewers. There is a growing sense in the academic community that peer review is in crisis because of increasing pressures to publish, the increasing volume of scholarly publications, and the precarity of the role of the academic expert (Flaherty 2022). The operation of peer review generally lacks transparency. While systematic evidence about participation in, and the conduct, quality, cost, and burden of, peer review is scant, the largest extant study documents a drop in the response rate to reviewer requests from approximately 52% to 47% during the period of 2013–2017 (Clarivate 2018). A decline in the supply of peer review labor threatens the quality of established outlets, the timeliness of scientific communication, and the ability of new scholarly initiatives to launch. The absence of transparency and the demands on reviewers may also contribute to the vulnerability of the system to manipulation towards the selection of ‘friendly’ reviewers (see, for example, Ferguson, Marcus, and Oransky 2014; Kincaid 2023; Kulkarni 2016).
GenAI has shown promise in summarizing and evaluating documents in general (Liu and Lapata 2019), and the incorporation of AI into peer review is already underway. Tools such as Scite (Nicholson et al. 2021) and Semantic Scholar (Fricke 2018) are well used in the academy to conduct literature searches and have shown promise in improving the validity of citations (Petroni et al. 2023). It is reasonable to assume that reviewers employ them. Likewise, for the last several years, several large journals have developed and deployed artificial intelligence systems to identify potential reviewers or to screen articles (Basuki and Tsuchiya 2022). The use of GenAI tools in peer review is less studied, but anecdotal reports of academic use of GenAI have led to a number of funding agencies banning their use for reasons of confidentiality (National Institutes of Health 2023).
We believe that targeted research in GenAI applications could be used to restructure the practice and institutions of peer review. Applications of GenAI to strengthen the system of peer review could include the following:
assisting editors in identifying potential peer reviewers through analysis of the relevant published literature;
providing reviewers with a summary of related literature;
providing reviewers with an abstract of the submission’s literature review, methodology, and results;
assisting editors, reviewers, and authors by identifying potential gaps and biases in the article bibliography;
assisting editors in verifying citations; validating protocol checklists, replication requirements, or preregistration requirements; and expanding, refining, and formatting reviewer comments;
assisting review editors in distilling multiple independent reviews; and
assisting editors by directly generating entire automated reviews as a replacement or supplement for human expert reviews.
The successful application of GenAI to these problems—if implemented with integrity (see section 5)—has the potential to substantially reduce reviewer burden, better inform reviewers, make the review process more consistent and reliable across reviewers and outlets, accelerate the publication process, and provide a framework for open documentation, measurement, and evaluation of the peer review system. Further, if GenAI is used to improve the peer review system in these ways, both individual reviewers and the system as a whole are likely to yield outcomes with higher quality and reduced bias.
Meaningful open access to research data—data that is “findable, accessible, interoperable, and reusable” (FAIR)—is key to the replicability of research (NASEM 2019; Wilkinson et al. 2016). Nevertheless, meeting funder and journal open data policies can be a time-consuming component of the research process, particularly given the rapid growth in the size, complexity, and scale of datasets generated during computational research. As a result of the effort and complexity required, weak incentives for data sharing and the inconsistent auditing and enforcement of journal and funder policies on data sharing most research data remains inaccessible (Dewald, Thursby, and Anderson 1986; Iqbal et al. 2016; Miyakawa 2020; Stodden, Seiler, and Ma 2018; Van Panhuis et al. 2014).
The advancement of AI research and tools depends on the availability of scientific data. The 2019 MIT Open Access Task Force (OATF) Final Report begins with the sentence, “Open access to the products of teaching and research promises to speed the accumulation of knowledge and insight and enhance opportunities for collaboration” (Ad Hoc Task Force on Open Access to MIT's Research 2019). Recently, the White House issued an executive order (Biden 2023) that emphasized the need for reliable data to fuel AI models and announced the formation of a National AI Research Resource to explore the “infrastructure, governance mechanisms, and user interfaces” needed to make “distributed computational, data, model, and training resources” available to the AI research community.
GenAI models are data intensive, requiring large corpora to function effectively. As we see large language models develop, researchers are seeking readily available data for training their algorithms, and the open availability of that training data is an important factor driving the choice of which datasets to use (Competition and Market Authority 2023). Current major implementations of GenAI provide limited (if any) transparency into their data collection processes (Bommasani et al. 2023). Available evidence suggests that much of the training data is scraped from the open web and that the data used by these systems varies substantially in quality and integrity (Competition and Market Authority 2023; Longpre et al. 2023).
There is already wide recognition in the field of AI and machine learning that the quality of results is highly dependent on the use of appropriate input data for training. Failure to consider representation bias and historical biases during data collection and training can easily lead to the embedding of those biases into the model itself (Mehrabi et al. 2022; Suresh and Guttag 2021). There is increasing evidence that GenAI models suffer degradation, or even collapse, when trained on their own outputs (Alemohammad et al. 2023). The future of GenAI may rely on curating a sustainable stream of identifiably uncontaminated inputs.
GenAI currently lacks the key grounding in scientific research and teaching highlighted by the OATF and National Academies of Sciences, Engineering, and Medicine reports. The outputs of GenAI are often not well aligned with the evidence base of teaching and research. AI initiatives are increasingly turning to proprietary sources (Competition and Market Authority 2023).
At its current stage of development, applications of GenAI to annotate and validate data are emerging (Alexander et al. 2023; Feuer et al. 2023). While the transparency of data used to train GenAI models remains a substantial concern (see section 5), the application of GenAI to data also has the potential to aid discovery and reuse.
We believe that targeted research in GenAI applications could be used to substantially expand the availability of open scholarly data by enabling better implementation and enforcement of existing data-sharing policies and improving the discovery and documentation of existing data. Applications of GenAI to this area could include the following:
automating the process of documenting data for sharing and reuse;
enhancing existing datasets with automatically generated documentation and metadata in standardized, interoperable formats;
automating the process of checking submissions against journal data replication policies to ensure that publications are compliant with standards and transparency requirements; and
improving the interfaces to data-discovery systems and the relevance of the results that they produce.
The successful application of GenAI to solve problems of open data availability and accessibility—if governed by policies that promote the health and integrity of the scientific evidence base—has the potential to improve the reproducibility and reliability of scientific results and to lead to the generation of new discoveries through promoting data reuse. In turn, a greater availability of open data can benefit AI development and applications by supporting new research and preventing model degradation.
Over the last two decades, the movement towards open access publication has substantially increased access to scientific outputs (Piwowar et al. 2018), and the analysis of large-scale commercial data across broader communities has contributed substantially to the methods, evidence base, pace, and impact of many disciplines in the social sciences (Lazer et al. 2009). However, most scientific outputs are produced and controlled by a small and unrepresentative proportion of the world’s population and are rarely accessible to everyone (Graham et al. 2014).
There has been increasingly wide recognition of the need to make the practice of science and engineering more inclusive of diverse communities and the impacts more equitable (Altman and Cohen 2021). A previous scientific consensus report (in which some of the current authors participated) concluded that advancing knowledge will increasingly depend on broadening access to and participation in science, and we identify expanding participation in the research community as a ‘grand challenge’ problem, with the potential for extensive impact (Altman et al. 2018).
However, more than 95% of published scientific papers are written in English (Liu 2017),4 and this creates a daunting barrier for scientists who are not proficient in English to function in an English-centered scientific world. Likewise, the pace of scientific research, and the introduction of insights from non-English-speaking scientists into the scholarly record, all suffer as a result from the singularly focused system.
GenAI has already shown a broad potential for assisting humans in reading and writing. It has further potential to enable people to express their ideas through pictures, essays, and software—and even objects, using additive manufacturing technologies—without having the specialized skills that would have been previously required. At its current stage of development, GenAI has demonstrated a capacity for summarization, annotation, and authoring of non-technical documents—and is increasingly being applied to summarizing technical documents (see, for example, Callaway 2023).
We believe that targeted research in GenAI applications could be used to substantially increase meaningful access to scientific publications. Applications of GenAI to broaden the accessibility of publications could include the following:
translating English language publications into the languages used in countries with economies in transition and developing economies;
augmenting publications with structured annotations to communicate article organization more systematically and at finer granularity;
describing specialized content such as figures, tables, and equations for readers with visual disabilities;
improving the understandability of text-to-speech synthesis for scholarly content; and
generating plain-language summaries of scientific findings for non-technical audiences.
We believe that targeted research in GenAI applications could also be used to increase the trustworthiness and integrity of science by reducing barriers to the participation of scientists from countries with economies in transition and from developing economies and underrepresented populations and institutions even in advanced economies. Applications of GenAI to broaden participation in scholarly publishing could include the following:
accurate and timely translation of manuscripts into English for initial review;
adapting AI authoring tools to the needs of authors not proficient in English;
adapting AI authoring tools for scientific writing; and
developing AI tools to facilitate the peer review process for English language–learning writers.
The successful application of GenAI to facilitate the standards of scientific integrity and enable a broadening of participation in scientific communication has the potential to accelerate the global impacts of science and to increase diversity in scientific fields.
To achieve the benefits described above requires that GenAI can be trusted to be consistent with scientific values and requirements. Currently, AI’s value for scholarship is limited to areas in which expert users are capable of independently verifying the factuality and accuracy of the outputs and independently assuring compliance with privacy, copyright, citation, attribution, and other requirements (see Azaria, Azoulay, and Reches 2023 for a review of applications).
The present generation of AI tools can readily produce content that appears scientific at first glance: for example, GenAI can be prompted to produce peer review evaluation questions about a provided article, summaries of the scientific literature, or translations of a scientific article to another language—as well as annotating a data table, graphing the data table, and then describing what a graph looks like for readers with visual disabilities. While these results appear plausible, and can even pass peer review (Cotton, Cotton, and Shipway 2023), the underlying systems are not designed to be compatible with scientific values, and the outputs often lack scientific integrity. Current tools are readily capable of inventing citations (Gravel, D'Amours-Gravel, and Osmanlliu 2023; Orduña-Malea and Cabezas-Clavijo 2023), fabricating abstracts (Gao et al. 2023), manufacturing data to support scientific claims (Taloni, Scorci, and Giannaccare 2023), or even inventing defamatory claims about other scholars (Cohen 2023).
The cumulative positive impact of science on society, and the direct contribution provided by individual scientific outputs, derive from their alignment with a set of underlying basic societal and design values. Society funds scientific research, trusting that the scientific enterprise promotes the discovery of systematic, reliable, and generalizable knowledge about the world—and that this knowledge makes human lives better. Readers of scientific articles assume the claims are accurate, trusting that the peer review and editorial processes mitigate against exaggerating the evidence for a conclusion or biasing analysis toward reporting a desired outcome. Both forms of trust are justified only when scientific processes and the scientific institutions are aligned with a set of core values.5 It is the alignment of key values, scientific processes, and institutions that fundamentally constitutes scientific integrity. With the rise of GenAI in scientific research, we need to ensure the results support the full range of values that underpin the scientific enterprise:6
Faced with growing evidence that the products of GenAI are persistently misaligned with these (and other) values, owners of the current generation of major systems have responded, for the most part, by adding ‘guardrails’ (for a recent example, see Peters 2023). Guardrails are developed only after failures have already occurred and these guardrails have been publicly reported to target a specific sensitive topic or pattern of use (e.g., a prompt on the topic of bioweapons) rather than addressing a broad requirement or principle. Further, the development of guardrails and mandates for them often rely on heuristic analysis rather than rigorous theory and design. Most protective mechanisms are often deployed only after an output has been computed rather than incorporated into earlier stages of design and planning.
In practice, the guardrails approach has been largely unsuccessful. Evidence of new failures and failure modes continue to surface with increasing frequency (Gupta et al. 2023; Maus et al. 2023; Qi et al. 2023; Zou et al. 2023). The capabilities of GenAI models remain difficult to theorize, predict, or assess. The behavior of these models is poorly understood, even by their creators. For example, researchers at Google found in post-testing that existing GenAI models in which the accuracy of their answers to math problems were often improved by preceding the specific problem with the phrase, “Take a deep breath and work on this step by step” (Yang et al. 2023). No one predicted such results. Nor, given the current state of algorithmic theory, would such a prediction have been credible.
Further, it is clear that state-of-the-art research and evaluation are not sufficient to assess the full capabilities of a trained GenAI model (Chang et al. 2023) through inspection or to reliably align that model’s output with specific values, principles, or rules (Liu et al. 2023). There is even evidence that GenAI models can develop the capacity to evade testing protocols by detecting that they are in a test environment and changing their behavior (Berglund 2023).
As discussed below (sections 5.1 and 5.2), a regime of post-model testing and guardrails alone is generally, even in theory, incapable of meeting important integrity requirements. As research in the fields of computer science, statistics, and information science has advanced, it has become increasingly clear that effective informational regulation requires explicit design.
Research, engineering, and design in AI alignment with scientific and ethical principles is a critical foundational requirement for GenAI in scholarly communication. A wide range of research questions will need to be addressed in order to achieve the integrity needed for responsible large-scale integration of GenAI into scholarly communication.
Factuality and honest uncertainty. Current GenAI systems are prone to hallucinations, overconfidence, and illusions of certainty, even when trained on correct and accurate inputs. More rarely, outputs could violate laws related to defamation when false (Volokh v. James) and privacy (see section 5.2 for a discussion of the latter). Designing foundation models so they are reliably correct, verifiable as to their sources, and transparent as to their level of uncertainties is a fundamental research challenge.
Homogeneity. While there is emerging evidence that GenAI can produce novel solutions to interesting problems (see section 1), there is also research emerging suggesting that GenAI models trained on their output degrade in unanticipated ways (see section 3) and suggesting that GenAI can lead to more homogenous solutions in some contexts (Dell'Acqua et al. 2023). Whether GenAI leads to homogenization may depend on the characteristics of the information infrastructure and ecosystem (see section 5.3 for a general discussion). Homogenization is potentially detrimental to science, in which its effects apply not solely to the presentation of information but also to the ideas reflected in solutions and hypotheses. Understanding the conditions under which GenAI models and systems incorporating them increase the homogeneity of solutions is an open question.
Quality. Generally, training learning algorithms to produce quality results is achieved only when the quality of output can be measured and used to inform the training. Quality requirements vary in kind and degree across scholarly use cases. For some uses, quality requirements are not stringent—both false negatives and false positives are routinely tolerated in discovery systems, and metadata annotations may be useful even if incomplete and sometimes incorrect. In other cases, quality requirements are important but can be incorporated directly into model training and evaluation. For example, the prospective error rates of automated language can be estimated using known corpora, which enables a decision-maker to determine whether automated translation is good enough for the intended use. A research opportunity is to develop standards and test methods, corpuses, and auxiliary tools that researchers and the public could use to evaluate the quality of algorithmic outputs in various contexts and use cases.
A more difficult challenge arises when the quality of the scholarly output cannot readily and reliably be assessed. For example, in expert peer review, independent reviewers often disagree in their evaluation, and evaluations exhibit systematic bias (Lee et al. 2013). Research suggests that the presence of entirely fake but plausible-appearing reviews can influence human evaluators (Bartoli et al. 2016). A need for principled methods to approach, align, and synthesize opinion among reviewers has been a subject of concern in the area of research funding for decades, which often employs panel review to manage reviewer divergence. More recently, deliberative peer review methods have also been adopted by some mega journals (Pain 2013). Notwithstanding the advances, the evidence of the effectiveness of current methods for addressing divergence in a principled way is scant (Guthrie, Ghiga, and Wooding 2018). There are important research opportunities for the use of GenAI in summarizing individual peer reviews and in supporting active deliberation among reviewers, editors, and authors.
The quality of peer review judgment and processes are rarely the subject of systematic and rigorous evaluation. Although standards for describing the conduct of the peer review process are now available (NISO 2023), there are no standard measures and data collection about peer reviews and the peer review process. In general, the absence of systematic quality evaluation is the rule, not the exception, for scholarly communications processes. The exclusion of such quality measurements is a barrier both to tuning AI models to be used in scholarly communications and to evaluating interventions using GenAI. The design of appropriate outcome measures for scholarly communications interventions and of observational and experimental methods for evaluating interventions is a critical open research question both for enabling trustworthy scholarship with GenAI and for systematically improving scholarship generally (Altman 2022; Altman, Cohen, and Polka 2023; Azoulay 2012; Hardwicke et al. 2020).
Identified information. GenAI models do not inherently protect the anonymity of individual data subjects. The outputs produced by these models can directly reveal inputs (see the discussion of memorization below) or, more subtly, enable inferential disclosure (Wood et al. 2018)—in which the receiver learns, with high probability, some private information about individuals described in the input. This issue is most recognizable when GenAI uses private data collected from interaction or measurement of people (e.g., health records) to train the foundation model or to tune those models at later stages (e.g., incorporating user-supplied prompts into the downstream tuning). Privacy threats can emerge from training on public data or on data that has been ‘anonymized’ based on a local jurisdiction’s legal requirements for two reasons. Legal standards for anonymization vary widely, and ‘anonymized’ data that is provided in one legal context may not necessarily be considered anonymous in other contexts. Unless strong cryptographic methods of privacy protection are employed for anonymization (which remains uncommon in practice), anonymized records can be reidentified with surprising frequency (Ohm 2010) or combined with other records in unexpected ways (Fluitt et al. 2019) to disclose private individual information. Effective anonymization in GenAI can be achieved using known cryptographic approaches only when that protection is incorporated by design into the training stage of model production (Boulemtafes, Derhab, and Challal 2020; Liu et al. 2023; Wood et al. 2021). Efficient approaches to privacy-preserving training of GenAI are a significant area of research.
Rights to personal information. GenAI models do not inherently ensure alignment with laws and regulations that govern data about individuals—including restrictions on publishing identifiable information, rights of correction and deletion, and limitations on the purposes for which data can be used (Congressional Research Service 2023). Open research questions include systematic theorization of the rights to deletion (or forget) in algorithmic systems (Nguyen et al. 2022); implementation of capabilities for individuals’ right to know, correct, and to delete data about themselves (South, Mahari, and Pentland 2023); and compliance with different restrictions on how input data can be used in downstream activities (see Wang et al. 2022 for an approach outside the machine learning context). Approaches to addressing personal information in the training of GenAI that are simultaneously efficient and effective remain an area of research.
Attribution and copyright. Current GenAI systems challenge traditional norms and expectations around attribution and the use of copyrighted material, including potential violations of law and license terms (Congressional Research Service 2023; Franceschelli and Musolesi 2022; Kuhn 2022). Achieving compliance with license attribution requirements by construction requires preserving provenance relationships during training—which is not supported in current foundation models. Current foundation models appear susceptible to both memorization and disclosure of training data (Nasr et al. 2023), and research in theoretical computer science suggests that unless explicitly controlled in design, large machine learning will, with high probability, memorize some inputs (Brown et al. 2021; Feldman 2020). Further research suggests that explicit algorithm design is also required for compliance with copyright law, for example, to prevent the use of too large a portion of an input or to prevent the creation of outputs that are too similar to existing protected works (see, for a review of issues, Elkin-Koren et al. 2023). As of the time of writing in December 2023, at least 12 lawsuits have been filed alleging copyright infringement by large language models, and it will be years before the outcomes of all of these cases are known. Mechanisms to limit memorization, track provenance, support attribution, and align machine learning outputs with the specific requirements of copyright and licenses are an open area of theory and application and will continue to be as AI, and the regulatory framework that applies to it, continue to evolve.
The changes caused by information technology usually start in the marketplace and ripple out into other domains. GenAI has already substantially changed the (fixed and marginal) costs of some information production processes. This raises questions about how GenAI will contribute to broader changes in the market and to changes in the relative advantages of capital and labor, to the distribution and concentration of market power across stakeholders and, in the longer term, to how these changes could (or should) affect the culture, norms, institutions, and regulation of the scholarly knowledge ecosystem.7
Without governance, neither information markets nor the scholarly commons function well and are particularly prone to monopolization, privatization, and underinvestment (Altman and Avery 2015). While some advocate self-regulation, industry incentives are misaligned (Ryan et al. 2022).
GenAI holds great potential to further disrupt existing markets, including markets for scholarly communication, in ways that are difficult to predict (Competition and Market Authority 2023). On the one hand, GenAI may dislodge incumbents with market power by reducing the costs of services, accelerating speed to market, or creating new market opportunities. On the other hand, the capital intensity of GenAI, dependence on highly skilled labor for development, high entry costs, dependencies on large corpora of (potentially) protected data, data hungriness (returns to scale on input data size), and the specialized skills required are barriers to market entry. These factors also pose substantial barriers to academic research, both because of the direct costs of training foundation models and because of the financial incentives for highly skilled researchers to exit academia for industry (see Gofman and Jin 2019 for an exemplar analysis of the latter trend). There is the risk of monopolization when models exhibit ‘network effects’ or other increasing returns to scale. For example, foundation models sometimes become much more useful as the corpora grows and as additional data is collected from user interactions with the model.
Understanding the general functioning of the scholarly ecosystem as it evolves will require both basic and applied legal, information science, economic, policy, and social science research (Altman et al. 2018). Understanding and governing the scholarly ecosystem will also require systematic measurement, collection, and sharing of data that measures the behavior and performance of the scholarly ecosystem and the results of interventions in it (Altman, Cohen, and Polka 2023). While these challenges are not likely to be addressed directly by GenAI applications, GenAI-enhanced tools could make it easier to effectively collect and share data about the scholarly ecosystem if these tools are designed to be open and auditable.
The potential for widespread adoption of GenAI processes in scholarly knowledge production raises a range of specific research questions about how GenAI is, could, and ought to affect the health and operation of the scholarly knowledge ecosystem: How does GenAI affect the durability and sustainability of the ecosystem? How does GenAI affect the norms and incentives for participating in science? How does GenAI affect who participates in science and how the burdens of participation are distributed? How does GenAI affect who benefits from these changes?
Science relies on an open, durable, and sustainable record. As discussed previously, the capital and data intensity of GenAI make it ill-suited to market-based mechanisms, and these features also put extraordinary pressure on current systems and approaches to nonmarket governance. More specifically, GenAI use could contribute to the shrinkage and/or privatization of research knowledge in a number of ways: the vast majority of current models are trained on publicly available information but are themselves proprietary, raising questions about how market structure, regulations, intellectual property regimes, organizational structures, and governance systems need to be designed and implemented in order to avoid GenAI use, leading to erosion or privatization of the shared knowledge commons (Chan, Bradley, and Raikumar 2023; Huang and Siddarth 2023; Seddon 2022). The copyright status of content produced by models, including the ability of model owners to assert restrictive rights to generated content, is uncertain. For example, it is not difficult to imagine that large commercial publishers, which currently control the largest databases of volunteer-generated peer review, could use this corpus to train a peer review service that would then be commercialized. In the absence of design and governance, low-status resource researchers will increasingly generate content that serves primarily as inputs for synthesis, while well-resourced institutions that control the corpora and analysis infrastructure will lead and own the resulting major discoveries. Substantial research is needed to design institutions and approaches to governing transformative capital and data-intensive infrastructures in order to yield a healthy knowledge commons.
Current major implementations of GenAI systems are unusually energy intensive (Patterson et al. 2021; Strubell, Ganesh, and McCallum 2020). Wide scale adoption of these tools in science and research has the potential to increase the climate impact of the research enterprise and raises novel questions about aligning the conduct and infrastructure of research with the value of environmental sustainability.
GenAI raises questions about the durability of the scholarly record. Digital publications and data are at substantial risk of loss as services shut down, and the software used to process different information formats change (Altman et al. 2020). As AI tools become increasingly integrated into the dissemination and interpretation of the scholarly record, new methods and institutions of digital preservation will need to be developed.
By changing the costs and effort required in different scholarly activities, GenAI may have spillover effects on cultural norms and practices within academia. For example, the valuation of external peer review would be challenged if peer review becomes capital intensive. The use of GenAI may have asymmetric and discontinuous effects on attribution and ownership, which could exacerbate achievement gaps (Porsdam Mann et al. 2023) or could enable human actors to shift away from responsibility for bad actions (Köbis, Bonnefon, and Rahwan 2021). Research is needed into designing norms and practices for excellence in hybrid human–AI scholarship (Dwivedi et al. 2023).
We previously discussed (in sections 2 and 4) how current systems of publication and peer review exhibit bias against, and create barriers to, broad participation in science and suggested some ways in which GenAI might be used to mitigate these problems. The broad use of GenAI will put pressure on the costs, incentives, and norms related to scholarly activities, has the potential to change participation and inclusion in unexpected ways that are difficult to fully anticipate. Approaches to theorizing, measuring, and engineering participation in science are an active area of research (James and Singer 2016). Research is needed into how interventions using GenAI affect participation directly and indirectly.
GenAI will substantially affect and rapidly advance the way researchers explore, discover, evaluate, and create human knowledge. If centered in ethics, information integrity, and the value of science to society, the use of GenAI has enormous potential to reduce barriers to participation in science and advance open, equitable, and trustworthy scholarship. Realizing this potential depends on developing and providing communal access to GenAI tools that can deliver results that respect the tenets of scientific integrity.
Current GenAI systems are capital intensive, energy intensive, and data hungry. They have the potential to ‘fence in’ the commons of information by transmuting public information into proprietary commercial AI models and by, possibly, imposing licensing on the resulting outputs; to shift competitive advantage away from expert labor to the (currently primarily corporate) owners of knowledge infrastructure; and to increase homogeneity in scientific outputs. Without effective regulations, GenAI has the potential to promote monopolies and increase the concentration of economic and cultural power.
Values such as privacy, explanation, and fairness must be achieved by carefully designing these capabilities into foundational AI models and by enacting meaningful governance of AI ecosystems. Current research and past experience show that these problems cannot be solved simply by bolting guardrails to increasingly complex and adaptive systems. Ensuring that these technologies enhance human agency and the public knowledge commons requires innovative research and thoughtful regulation of AI markets and systems.
Despite the real and much-discussed risks, we believe that GenAI can offer many opportunities to address known failures of integrity in the current system by restructuring and streamlining peer review; by facilitating open data sharing, documentation, and discovery; by making scientific outputs accessible to a broader set of communities; and by reducing barriers to participation in scientific authorship. With this paper, we aim to establish a roadmap for a values-driven exploitation of these opportunities.
We thank Mohamed El Ouirdi and Amy Nurnberger for their commentary on the early stages of this project and Eva Campo, Philip Cohen, and David Weinberger for insightful review comments.