End-to-end security and privacy are increasingly urgent as generative AI is deployed in organizations and integrated into our daily lives. Many partial solutions exist across the research and practice landscape, yet none provide an end-to-end solution for generative AI.
End-to-end security and privacy are increasingly urgent as generative AI is deployed in organizations and integrated into our daily lives. Many partial solutions exist across the research and practice landscape, yet none provide an end-to-end solution for generative AI. These technologies address different threats or attackers and aim to satisfy varied security guarantees. Complex supply chains and computation pipelines used in training and deploying foundation models require combining a plurality of technical solutions to address security and privacy concerns. This roadmap provides a framework for clarifying the different security challenges encountered in generative AI and organizing the existing defense technologies to address them. First, we define the security goals we want these systems to achieve; contrasting privacy (which implies that no sensitive information, such as private training data, proprietary model weights, or sensitive user inputs, is leaked) and verifiability (which makes it possible to confirm the integrity of the computational steps in the generative AI pipeline). We examine how these goals apply to two types of attackers: an internal attacker who can interfere along the generative AI pipeline and an external attacker who can only see outputs. We show how different existing technologies for verifiability and privacy sit in this framework, where they can be used, and how they can be swapped or combined to achieve end-to-end security. No single technology will solve the security woes of AI. A modular and hybrid solution that combines existing and new innovations in cryptography and privacy can pave the way for a secure future for generative AI.
There is an increasing need for technologies to be end-to-end secure and handle user data privately in the twenty-first century; generative AI is no exception. The new capabilities of generative AI hold immense promise, yet unbridled, they present privacy and security risks to user inputs, training data contributions, model weights, and usage patterns. This has led to a status quo of models trained on undisclosed collections of scraped public content hosted by model providers that see all user behavior, retain the right to train future models on that user data, and provide no real security guarantees.
Deployed AI models face significant privacy challenges. If users include sensitive information in their prompts (the input to a model), that data can leak via data breaches or when the model’s usage data is used to train future AI systems. When model developers choose to ignore content privacy risks during training, they risk causing substantial downstream harm to individuals whose private information may be inadvertently exposed or misused by the AI model.
These risks and limitations ultimately hold back the deployment of this technology. End users are concerned about how their data will be used, given the history of large tech companies exploiting their data; enterprises are wary of the risks created by uploading intellectual property or commercially sensitive data; and society at large is worried about how these risks interact with vulnerable populations such as children.
Legislation such as the General Data Protection Regulation, California Consumer Privacy Act, and European Union AI Act have identified these concerns and created rules around the use of consumer data, explicitly calling out high-risk contexts in which privacy is critical.1 If generative AI is to have widespread adoption, especially in high-risk contexts such as education and healthcare, new solutions for privacy and security in generative AI will be needed.
Building this private toolkit is not just a matter of complying with emerging regulations but an opportunity to look forward to how we want private user data to be handled in a world where more and more of society is mediated through the lens of AI. Privacy can slow progress in the short term, yet it is critical to the technology’s long-term growth. Building user trust, both from the content provider and end-user sides, is a necessary step in the full-scale rollout of generative AI technologies. Such a toolkit will not be static; new research questions will be asked and answered as AI models and the ways we use them evolve rapidly. This work provides a brief survey of existing solutions, a framework of how these tools fit into the broader picture of privacy and security for generative AI, and a call to action for researchers to fill in the missing gaps.
To understand where security matters for generative AI, we need to look at the computational pipeline and supply chain it relies on. For simplicity, let’s boil down the pipeline of AI into five (not strictly ordered) steps. Generative AI starts with a ‘large-scale data collection’ of web-scale pretraining data, domain-specific pretraining data, and other control datasets such as instruction finetuning data. This data is then used in the ‘pretraining of large foundation models’ to learn the general purpose and emergent capabilities. These large models then undergo ‘finetuning’ to become useful for a specific application. This can include instruction finetuning, reinforcement learning from human feedback, learning to take actions, or task-specific finetuning. This step requires separate training data, but the lessons and requirements from pretraining data collection still apply.
Once the model is fully trained, it needs to be provided to users (in which users could be enterprises using AI agents, consumers using chatbots, or any other AI application). While there are many instances in which models are deployed on a local machine, the large scale of these models and their onerous compute hardware requirements result in many AI models being deployed in the cloud. In such an instance, inference refers to the model receiving a prompt, repeatedly generating output tokens (words), and sending the response back to a user. In addition to this inference, there is often a second step in which the AI model will request, or need to be provided, an additional piece of input. This paradigm of ‘retrieval augmented generation’ (RAG) accesses databases of information external to the AI model and pulls that information into the context window (the content the AI can see).2
These steps towards the use of AI are performed by many different participants, each of which may have sensitive data to be kept private or computations that need to be trusted, and each is an opportunity for added security. Consider the following participants in the end-to-end supply chain of AI and where their concerns might lay:
data contributors, who generate and provide training data for foundation models for use in pretraining or finetuning
model creators, who use such data to create and train AI models
model deployers, who take pretrained models and host their inference and general use for users
end users, who receive the outputs of generative AI and use the models
external databases, which may be accessed by the model during runtime for RAG (this includes web browsing or other document retrievals)3
These distinctions are amorphous. There are many instances in which two parties are the same, such as the common cases in which model creators are also model deployers (see OpenAI), model providers also provide the external database (such as Google Gemini using Google Search), or data contributors are also end users. Further, many of these categories can encompass multiple parties. An AI model can be trained by one company and finetuned by another (such as through reinforcement learning from human feedback as a service), in which both parties are lumped into model creators. Similarly, a large model may have been hosted for inference by a cloud provider but accessed through a third-party app that handles user inputs and model outputs (e.g., any software wrapping around a generative AI API); not only are both deployers here with respect to the personal end user but the third-party app is also an end user with respect to the cloud model inference host. This web of privacy and security dependencies can become complex quickly but is underpinned by a set of relationships that can be individually addressed via specific security and privacy solutions between each step.
The concept of security is multifaceted, encompassing an extremely broad range of security properties for a system. To organize the different security objectives in generative AI that we discuss in this work, we condense the ‘threats’ against AI systems into two principal attacker models, each calling for different defense strategies. We then define privacy and verifiability, the two specific security properties we want our generative AI systems to satisfy.
An ‘attacker’ model tries to describe the capabilities of a potential malicious actor who is trying to bypass the system’s security. While many details are obfuscated here, at a high level, there are two types of entities we’re worried about.
First, we address an attacker inside the computational pipeline. An organization or individual hosting, running, or controlling one of the computational steps might be ill motivated. Such an attacker may wish to extract sensitive data or tamper with the computation itself. Consider the case of a malicious large language model provider that wants to access user input, a database provider that wants to know what AI is being used for, or an inference service wishing to deploy smaller models to reduce costs.
Our second attacker only has black box access to the model. That means it has control over the prompts sent to the model and can observe the model responses. Here, the attacker can try to extract sensitive information such as training data or model weights. This encompasses practically any end user of an AI system trying to extract private or proprietary information about the model and the training data by prompting the model.
Noting the above threats, let’s turn our attention to the security goals we actually want to achieve. At a broad level, the two properties we want AI systems to uphold are privacy and verifiability.
A system enforces privacy if it does not leak sensitive information to a potential attacker. We focus on three types of data to help us frame how to organize different existing threats and defense mechanisms.
Private training data: The training of generative models requires huge amounts of data, for both pretraining and finetuning, which in some instances contains proprietary data or sensitive information regarding individuals. If a language model memorizes this data, it can be leaked to an end user. Similarly, at the inference stage, the model may query a database holding sensitive data, which could, in turn, be leaked to an end user. There are also instances in which private training data should be kept secret from an internal attacker. Many regulations mandate that personal or sensitive data (such as medical records) never leave the device or custodian server in their raw form, and only aggregate and privacy-preserved insights can be shared.
Proprietary model information: In many cases, model weights are the most valuable intellectual property across the pipeline. Keeping model weights or model architecture private not only from the public (an external attacker) but also from hackers or malicious actors within the supply chain (internal attackers) is paramount. There are also more complex instances of privacy here. Consider the case in which a benchmarking provider has private data (that they wish to keep private so that no one can train on it and game the benchmark), and a model provider wishes to keep their model weights private. Without cryptographic solutions or a trusted third party, these two are at an impasse in maintaining the privacy of their proprietary data while performing the benchmark on a private model.
Sensitive user input: When interacting with the generative model, a user might share sensitive information through their prompts, especially when the model is used in a sensitive context such as personal health or company record management. It is critical that this information isn’t leaked to an external party or the public; equally important is that companies minimize how much these user records are transmitted to other parties in the generative AI supply chain.
Verifiability is the ability to create guarantees and verify that computational steps of the AI pipeline have been run and done so correctly. Such a ‘computation step’ can be extremely broad and can range from verifying that model inference was performed on the proper model to verifying a computational claim that the training data did not contain New York Times URLs. Such a guarantee allows a user or actor in the supply chain to obtain a verifiable proof of the integrity of what has occurred inside an opaque system. This is useful when an end user wants to verify which model it is remotely accessing or if a company in the supply chain wants confidence in the computations performed by its suppliers.
These threats to generative AI systems are not competing goals. Some of the technical solutions we see below sometimes address several of these threats at once and often complement each other. Indeed, a key goal of many security solutions is to both provide privacy and allow a user to verify that the operations on the private information have succeeded.
While the roles of privacy and auditability in security are already broad, it is critical to point out the large number of security goals we do not address here. The security properties above do not include risks associated with model behavior, including biased model outputs, effective guardrails on outputs or use, and dual-use capabilities. This is similar to how transport layer security (this of the secure ‘s’ in ‘https’) doesn’t concern itself with the content of the webpages it secures, only that it is secure from attackers. Some innate properties of models, such as their ability to be compromised by adversarial prompts or their tendency to hallucinate, are not part of this discussion.
Further, we don’t address the risks created by the supply chain of code and training data. This data can provide undesirable knowledge (how to make a bomb) or be an attack vector for data poisoning.4 While some of the tools we describe (e.g., differential privacy or zero-knowledge data proofs) can help address this issue, we leave the effects of data on the model outside the scope here. Supply chains of other inputs (open model weights and inference code) can also be a mechanism for attacks but is not addressed via the tools laid out here.
These risks should be taken into account when designing end-to-end generative AI systems but cannot be addressed using the cryptographic and confidential computing mechanisms we present in this article.
There is a vast array of open-source technologies, commercial solutions, and new research that address the challenges of security in generative AI. For the most part, these tools will fit in the above framework of security guarantees and what they’re defending from. For now, let’s elaborate in more technical detail on some of these technologies to address the above concerns.
These solutions prevent the leakage of sensitive data by keeping it encrypted during transit and usage or by isolating the sensitive computation from other pieces of software at the hardware level. Each solution presents a different tradeoff between performance, security or trust assumptions, and usability, providing different generic primitives for secure computation that can address privacy concerns against an internal attacker.
Homomorphic encryption (HE): HE enables computations directly on encrypted data without requiring decryption, ensuring confidentiality throughout processing. HE usually relies on public key encryption schemes, which means that anyone can encrypt data using a public key, and several parties can provide different inputs for HE, but only the parties holding the private key can decrypt outputs or intermediate states. Unfortunately, HE has a significant performance overhead, sometimes orders of magnitude above basic computation, and implementing HE for the diverse operations involved in the pipeline can be quite complex. As a result, in practice, HE has been mostly used for inference, protecting the user’s private prompts.5 Nevertheless, HE could also be used for model training, in which HE permits learning from encrypted datasets, promoting privacy-aware collaboration.6
Multi-party computation (MPC): MPC makes it possible to split trust among several parties to perform sensitive computations securely.7 This can allow generative AI inference to occur across multiple servers without any one server seeing the model weights or input data.8 MPC also makes it possible for different parties to each contribute their own private inputs without revealing those inputs to other parties. This can allow setups in which a user and a server can collaboratively run inference on the user’s private prompts and the server’s private model without revealing those secrets to each other. Like other cryptographic techniques, MPC can be extremely slow, especially when servers must communicate intermediate values to one another. There have been attempts to optimize the systems for machine learning, which has allowed small transformers to be run.9,10
Confidential computing: Confidential computing, leveraging trusted execution environments or other secure hardware enclaves, allows a user to execute code on a remote server, for instance, in a public cloud, without trusting the software stack or the cloud provider. They rely on hardware primitives that isolate the sensitive computation from other pieces of software and allow for bare-metal performance with limited overheads. They are applicable across the computational pipeline and have recently been extended to GPUs (although challenges to deploying large language models still exist). These hardware solutions are exciting but lack the mathematical security guarantees provided by the cryptographic solutions above.
On-premise hosting: Hosting sensitive parts of the process on premise can address many of the concerns around data privacy for the owner of the sensitive data by not involving external parties. For instance, training could happen on the local servers of the data contributor to protect training data, while inference could happen on premise of the model provider to protect model weights. While this privacy solution is simple and easy to understand, it is often infeasible. First, in modern pipelines, the owners of the different sensitive data might be different, and no clear party can be trusted to handle training data, model weights, and user inputs. Furthermore, large models can be hard to deploy, requiring expensive and difficult-to-manage infrastructure that is easier to find and maintain in public clouds. Nevertheless, on-premise computation can be a valuable and simple tool to protect computations in generative AI from attackers.
Federated learning: Federated learning (or its related alternative, split learning) is a training method that distributes the training of a model across each data contributor, allowing models to be trained without raw data leaving the contributor’s device or local server.11 This can slow down or limit the fidelity of training but avoids the risk of exposing raw data to a potential internal attacker, like a malicious centralized server, while allowing patterns to be learned across diverse data sources.
Synthetic data: Synthetic data in machine learning refers to artificially generated data designed to emulate real-world data without being collected from real-world observations.12 It is created to replicate the statistical properties and patterns of real data but is often an incomplete representation. Synthetic data is commonly used when real data is limited, expensive, or difficult to obtain or when privacy concerns prohibit the use of real data. By generating and transmitting only synthetic data, other actors in the training pipeline cannot see the sensitive original data.
Local execution: Perhaps the most obvious solution to ensuring the privacy of user inputs is to store all data and execute all model inferences locally on the user’s device. There are many contexts in which this is a useful privacy solution that is both simple to implement and easy for consumers to understand. However, many models, especially the most powerful ones, are extremely hard to run locally due to their size. This makes many use cases, especially those involving phones, very hard without cloud computing.
Private information retrieval (PIR): Putting aside training and inference, many modern generative AI workflows leverage external databases or web searches to generate relevant content. This ecosystem of retrieval augmented generation often leaks information about how the generative AI system is being used to external databases and tools. PIR can be critical in protecting this information from leaking to external parties. PIR can be done using various technologies, including searchable symmetric encryption, private set intersection, or MPC, and has been shown to be useful in generative AI contexts for web search and database retrieval.13,14
So far, all approaches focused on protecting input privacy throughout the computation itself. Still, they did not capture any leakage that may have occurred from the model’s output to an external attacker. Thus, we need other approaches that specifically focus on protecting against an attacker who intentionally extracts data by repeatedly querying the model and examining the outputs.
Differential privacy (DP) in model learning (and beyond): DP is a statistical method that ensures the privacy of training data from model memorization by injecting noise during training in order to maintain a given ‘privacy budget.’15 This is the first of these solutions to address the privacy of training data from end users directly and can, to an extent, allow for the training of large models on private data at either pretraining or finetuning.16 This can be combined with tools such as federated learning but can also reduce the performance gain or knowledge improvement that a piece of data could contribute.
Other statistical noising approaches: Recent work has presented other approaches to adding privacy during training, such as PAC (Probably Approximately Correct) Privacy, Pufferfish privacy, and others.17,18 These approaches take alternative formulations on how to minimize the memorization and regurgitation of training data while still learning generalized information from the data.
Noise introduced by other methods: Federated learning and the synthetic data above can also introduce limitations on memorization and the leaking of individual data points, which, beyond limiting the transmission of raw data, can help avoid leaking sensitive information to an end user. Unfortunately, these methods usually do not provide any formal guarantees alone, and strong privacy for training data can only be achieved by combining them with the ones presented above.
Using privacy on prompts and retrieved documents: While the above privacy methods are most commonly used to minimize information leakage during training, they can sometimes also be used during inference time. Since both prompts and retrieved documents can be represented numerically in embeddings, the above privacy tools can be applied to limit the ability of an external attacker to see retrieved data or original prompts (or, separately and related to the internal attack, could be used to obfuscate the prompt from the inference provider at the cost of accuracy).
Verifiability enables checking that no one is manipulating the AI pipeline and allows statements about generative AI that cannot be faked. These verifiable statements and guarantees of integrity may be seen by internal or external parties and protect against attackers who are manipulating the internal pipeline, not the external prompts.
Remote attestation of confidential computing: Many of the privacy tools outlined above also allow a deployer or user to verify that privacy has been maintained and computation occurred correctly. This ability to check on the process is possible in everything from trusted platform modules to MPC, but its implementation varies. This verification itself rarely slows down computation significantly but is often associated with the heavy cryptographic tools that already create overhead.
Zero-knowledge model proofs: The rapidly growing field of zero-knowledge machine learning allows proofs of model inference to be run that allow a third party to verify the inference occurs while maintaining privacy over model weights or inference inputs.19 A key aspect of this is having confirmation that the correct model was being run when inference is external.20 This can, in turn, enable powerful additional guarantees like verifiable evaluations of model performance without exposing model weights or benchmarking data to end users or the public eye.21 Proving is, however, very slow compared to standard inference.
Attestations about data: The public or other parties in a supply chain may often have questions about training data which can, at times, contradict our privacy requirements. Using methods such as Merkle trees or smaller zero-knowledge proofs, parties can share attestations (verifiable statements) about the data they are using. This can allow for verifiable data provenance or confirmation that sensitive data was excluded (such as through an AI Bill of Materials).
Building and deploying generative AI systems require enforcing security at each step along the supply chain. Guaranteeing robust privacy and verifiability for an end-to-end system will require composing the above solutions as building blocks to cover each corresponding threat, as no one approach covers the entire attack surface.
Solutions that address different security properties or attacker models can be combined to cover as many threats as needed, while ones addressing the same risk can be used at different steps of the pipeline. Often, methods addressing the same type of threat can be compared and offer a tradeoff between performance, security, and usability. For instance, fully homomorphic encryption offers strong security relying on well-studied cryptographic assumptions, but performance overheads make it impractical in most cases. On the other hand, multiparty computation relaxes assumptions on trust by requiring some servers to be trusted but offers better performance.
To illustrate how to compose these different solutions, let’s examine a real-life example. Consider the canonical example of using AI in healthcare, where privacy is paramount, and generative AI has huge potential for improved diagnosis and personalized treatment. First, large foundation models require huge quantities of training data, but not all data needs the same privacy constraints. Data collection and pretraining could start using open public data to build a base capability within the model. Such training could be attached to zero-knowledge attestations about training data to enable the public to audit inputs verifiably. Pretraining could then shift to data owned by hospitals that has sensitive attributes or personally identifiable information; for this, we could use federated learning to make sure no central server learns information about the patients but combine it with differential privacy to guarantee that the final model will not leak patient private data to a malicious user. This model could now undergo finetuning for instruction following or alignment. These approaches so far help keep training data private from internal and external attackers by ensuring that sensitive data isn’t exposed to a cloud provider or regurgitated to an end user.
Next, deploying this model could be done in various ways to fit a wide class of privacy guarantees. For general questions, standard cloud hosting will suffice with traffic encryption to avoid web snooping. However, for sensitive queries involving patient data, we may want to turn to a solution ensuring the privacy of user prompts with regard to the cloud provider. One such example is hosting the model in a trusted execution environment such that even the model provider cannot access or see what user prompts are being passed to it.
This model usage can, in turn, be augmented by assessing up-to-date medical data from databases. For simple augmentation, such as local patient data, it might suffice to have a locally hosted database with the model hosting or the client; for external data, such as new clinical practices for external vendors, private information retrieval could be used, ensuring that the external database doesn’t see the sensitive content of prompts.
This example is all possible using existing privacy and security technologies available today. It demonstrates how such a patchwork of solutions can be combined to provide real privacy guarantees across the computational pipeline of AI. In many of the above instances, more advanced technologies could replace existing ones. For example, with the improvement of HE for machine learning, the inference step could be entirely performed under FHE. While a wide array of new approaches can be expected over time, this framework in which different aspects of the generative AI life cycle need different solutions will remain true.
The world will see an increasing demand for privacy from consumers and enterprises when generative AI is evermore integrated into our lives and used in sensitive contexts. To address this, end-to-end security will be required in many settings. As we see above, this will require combining technologies to balance speed, security, and usability. Even in cases in which simpler solutions such as locally hosted models become popular, there will still be a need for specific technical instantiations of privacy and security to enable these models to reach their fullest potential.
Regulation will further drive this. AI regulations increasingly require developers and deployers to consider the role of security and privacy in AI systems. The EU AI Act highlights the importance of security, requiring high-risk AI systems (which includes systems deployed in education, employment, and emotional recognition) to be “resilient against attempts by unauthori[z]ed third parties to alter their use, outputs or performance by exploiting system vulnerabilities” (Art. 15), and where AI systems are evaluated on sensitive personal data, these systems must include “state-of-the-art security and privacy-preserving measures” (Art. 10) to safeguard the personal data. Similarly, President Biden’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence calls for AI systems to be “resilient against misuse or dangerous modifications.” These statements are echoed across regulations around the world, and their compliance obligations will create a potent demand for privacy and security for AI systems.
Agentic behavior can follow naturally from this existing patchwork approach. Much discussion and effort have gone into developing AI systems that can take action in the world (agents) built upon the utility provided by foundation models. As foundation models develop new capabilities and modalities, cryptographic and privacy tooling needs only be developed for these new applications and then integrated into the existing stack of software designed for the trustworthy use of AI systems.
Still, more research is needed across this computational pipeline and supply chain. While much of the existing work in security for AI is production ready (and many tools are widely used), other aspects still require scaling, speed improvements, production testing, or continued deep research. MIT has long been a bastion of cryptography, security, and privacy research and will continue to advance these technologies needed for a private and secure future with generative AI.