Generative AI (GenAI) is rapidly emerging as a powerful new platform for software development—a foundational technology enabling a wide variety of applications, like past platforms such as computers, smartphones, and cloud services. Major players include producers of large . . .
Generative AI (GenAI) is rapidly emerging as a powerful new platform for software development—a foundational technology enabling a wide variety of applications, like past platforms such as computers, smartphones, and cloud services. Major players include producers of large language models (LLMs) like OpenAI, Google, and Meta; hardware and software providers like Nvidia; and cloud infrastructure providers like Amazon, Microsoft, and Google. An ecosystem is evolving with infrastructure layers, foundation LLM models, an array of tools and frameworks, and a rapidly growing set of applications spanning horizontal products for broad usage as well as customized vertical solutions. Key issues going forward include potential market concentration, data privacy and ownership concerns, the accuracy and reliability of AI-generated content, regulation versus self-governance challenges, disruption to jobs and industries, and significant environmental impacts from increased energy consumption. As capabilities and adoption of this new AI technology expand, companies, universities, governments, and technology experts must think carefully and collaboratively about the costs, benefits, trade-offs, and potential dangers of GenAI as a new applications platform.
Keywords: Generative AI; large language models; software development platform; application development platform; platform technology; innovation platforms; technology governance; AI market competition
Generative AI (GenAI) is a powerful artificial intelligence technology using a relatively new approach to neural networks and machine learning. It is also rapidly becoming a new ‘platform’ for software development—that is, a foundational technology used by many individuals and organizations to build a wide variety of applications.1 Goldman Sachs estimates GenAI applications could enhance productivity and raise global gross domestic product by $7 trillion (7%) over the next decade.2 Some of these new applications have the potential to greatly benefit society, while others are likely to disrupt existing markets and occupations. The platform technology and supporting ecosystem of hardware, software, and infrastructure producers have similarities to what we have seen before with computers, smartphones, and cloud services. However, GenAI has important differences that raise new governance and regulatory challenges. This paper sketches out how GenAI has evolved into a new platform, who the major players are, what the current structure of the ecosystem looks like, and what issues we should be concerned about going forward.
We use the term ‘platform’ to refer to a foundational technology that functions at the industry or ecosystem level and brings together different market actors for a common purpose.3 The platform may enable innovations by bringing together users with third parties interested in building applications, such as for a computer or smartphone. The platform also may enable transactions by bringing together buyers and sellers via search engines and marketplaces or enabling a variety of services. In recent decades, Microsoft, Apple, Google, and Amazon have dominated platform technologies with the Windows, iOS, and Android operating systems as well as Amazon Web Services (AWS), Google Compute Platform, and Microsoft Azure. There are many other cloud services (for example, WeChat, Facebook, and TikTok) that function both as applications and platforms for third-party products and services.
Foundational technologies with application programming interfaces (APIs) connect platforms to external market actors and exhibit another important phenomenon—network effects. These self-reinforcing feedback loops can bring increasing value to users by providing access to a network of other users or innovators on the demand side and platform participants or asset owners on the supply side. Positive network effects can enable user numbers to grow extremely quickly and facilitate nonlinear growth, especially in the early stages after launch. Facebook and WeChat experienced this with their social media and messaging platforms. With other transaction platforms, such as online marketplaces or Fintech exchanges, more buyers attract more sellers, and more sellers attract more buyers. With innovation platforms, such as operating systems for personal computers or smartphones, more users or more third-party applications tend to attract more users, and more users attract more applications, which attract even more users and applications. Some platforms also exhibit ‘data network effects,’ which resemble economies of scale in learning, as in search engines paired with advertising engines. More or better data can lead to more useful insights and links for end users as well as more targeted advertisements.4
GenAI is already enabling innovations in the form of software applications or services delivered via transaction platforms. We also see learning associated with increasing amounts of data. The challenge is that network effects can quickly shape the evolution of markets, competition, and ecosystems. In particular, network effects can make it increasingly difficult for competitors to enter a market or to survive competition with bigger players—the ‘winner-take-all-or-most’ phenomenon.5 Too many competing platforms is not necessarily good for users in that incompatibility in applications and services or lack of interoperability can create unnecessary costs and inconvenience. At the same time, though, dominant platforms can stifle innovation, harm users and competitors through predatory pricing, and produce lasting monopoly profits for a small number of firms.6 Given our experience with computers and smartphones, as well as other industries, we need to ask which companies are likely to dominate as GenAI becomes a more ubiquitous technology.
The competition to create new foundational technologies and application development platforms in the AI space is not new. Neural networks and machine learning have been around for decades. However, the technology we now call GenAI took on a particular form after the publication of research done at Google in 2017.7 As has been widely reported, a team of scientists designed a neural network called the “Transformer” that demonstrated outstanding performance in language translation. The scientists carefully architected the Transformer to leverage the strengths of parallel processing hardware called graphics processing units (GPUs). The new technology enabled the creation of much larger neural-network models trained on larger datasets than before, and this led to dramatic advances in performance and accuracy. As a result, the Transformer architecture very quickly expanded beyond its initial application of language translation to many use-cases.
OpenAI, established in 2015 and now backed by approximately $13 billion in investment from Microsoft, was one of the first firms to leverage the Transformer technology to create ‘language models,’ i.e., models trained on vast amounts of text that could generate new text of their own. OpenAI developed a series of increasingly capable language models called the GPT family (GPT stands for “Generative Pre-trained Transformer”) in quick succession: GPT in 2018,8 GPT-2 in 2019,9 and GPT-3 in 2020.10 Each of these models was larger than its predecessor and trained on a larger corpus of data. GPT-3 could generate language at an unprecedented level of fluency; it also had the surprising ability to answer questions after seeing only a handful of examples. But it could not follow user instructions precisely, and OpenAI developed an approach to address this shortcoming.11 The result was ChatGPT, released in 2022, followed by the release of GPT-4 in 2023.12
Initially, Gen AI systems were ‘single-modal.’ That is, they expected the input data to be of a specific modality and would generate output of a specific modality. Text-in, text-out single-modal GenAI systems and text-in, image-out single-modal GenAI systems were initially the most popular. More recently, companies have developed ‘multi-modal’ GenAI models such as GPT-4 Vision and Google Gemini Pro 1.5. These can receive and generate both text and images. 13
Similar to the computer and smartphone, a key part of the GenAI platform is the software and hardware required to develop, train, and deploy the foundation models and applications. The key player in hardware—mainly GPUs—and some key aspects of software development technology—mainly lower level programming tools, frameworks, and libraries—is Nvidia. Currently, this company has a market share of approximately 80% for GenAI-related GPUs, and its market value exceeds $2 trillion, trailing only Microsoft and Apple.14 Nvidia was established in 1993 to make specialized chips that took over graphics processing from central processing units (CPUs) in personal computers, a market still dominated by Intel. Then, in 2006, it introduced two innovations that laid the groundwork for what would become its dominant role in the GenAI ecosystem.15
First, in GPU hardware design, Nvidia switched from arrays of a few dozen or hundreds of specialized ‘compute cores’ (subprocessors) that could perform complex tasks independently, like in the CPUs used in personal computers, to arrays of thousands of simple cores running in parallel twice as fast or faster. Each core could handle a few pixels on a graphics display and perform many specific tasks in parallel.16 Nvidia’s new GPU architecture turned out to be perfectly suited to the huge number of matrix multiplication tasks and logic layers that lie at the heart of neural networks. Second, following how Microsoft had come to dominate the PC programming world with free DOS and Windows software development kits (SDKs), in 2006, Nvidia also introduced its own programming model and language with a free SDK called CUDA, for Compute Unified Device Architecture. CUDA started as an extension of C/C++ to support fast parallel processing by directly accessing instruction sets in Nvidia’s new GPU hardware.17
By the end of 2006, researchers in France and elsewhere had successfully used Nvidia graphics cards to train neural networks.18 Other pioneering work in pattern recognition occurred over the next several years, most notably at the University of Toronto during 2011 to 2012 (“AlexNet”).19 In 2014, Nvidia introduced new tools and libraries specifically for building deep-learning applications, called CuDNN (CUDA Deep Neural Network).20 In 2016, to stimulate the applications ecosystem, Nvidia donated several of its most powerful GPU servers to universities as well as to OpenAI, then organized as a nonprofit research laboratory.21
Nvidia has continued to invest aggressively and with great foresight. Its goal has been to optimize both its GPU hardware and CUDA software for developing LLMs and deploying inference engines.22 Meanwhile, competitors and users have been working hard to produce alternatives to Nvidia hardware and software. As we discuss later, we expect these alternatives to reduce the cost of developing and using new GenAI systems, although Nvidia’s head start of so many years will not be easy for competitors or the open-source community to match.
The current GenAI ecosystem has several layers (Figure 1). This structure is similar to what we have seen before in personal computers, smartphones, and cloud services, though the technology is very different and some of the key players are new. Infrastructure providers produce GPUs and other hardware as well as offer cloud-computing services that run GenAI software. Foundational models (also called foundation models) are the LLMs produced by OpenAI, Google, Meta (Facebook) and several other firms. These and other companies also produce SDKs—tools, frameworks, and libraries—used to develop and train the foundation models as well as to build inference engines and applications. Applications, now in the hundreds and growing in numbers very quickly, come from both ‘horizontal’ vendors targeting broad sets of users as well as ‘vertical’ vendors targeting specific industries. Established firms are also incorporating GenAI technology into their existing software products or web-based services. At the periphery of the ecosystem are owners of data and content used to train the GenAI systems. Yet another part is the regulatory infrastructure that governments around the world will need to introduce to oversee the usage and evolution of this powerful new technology.
These refer to “any model that is trained on broad data (generally using self-supervision at scale) that developers can adapt (e.g., fine-tune) to a wide range of downstream tasks.”23 This definition acknowledges that foundation models are more than just LLMs; they include systems that can work with other data modalities like images and video as well as multi-modal models that can handle inputs from different modalities as part of a single prompt (e.g., an image and a question about the image).
There is more competition among LLM producers than we have in computer or smartphone operating systems because we are at an early stage. OpenAI and Microsoft (ChatGPT, Bing, CoPilot), Google (Gemma, Gemini), and Meta (the LLaMA family) have attracted the most attention. 24 Other players here include Amazon (Alexa, AWS), Alibaba (DAMO), and Baidu (Ernie Bot) as well as well-funded start-ups such as Cohere, Mistral, AI21 Labs, Contextual AI, Hugging Face, Anthropic (backed by Amazon), and Inflection AI.25
We can characterize LLMs along several dimensions, such as their overall capability, the cost of API access, whether they are closed source or ‘open source’ (the definition of open source in this context has some nuances, compared to traditional software, which we will return to later), and whether they are specialized for a particular domain (e.g., generating code in software development) or general purpose. In terms of overall capability, larger LLMs tend to outperform smaller LLMs (Figure 2).26 However, the relative ranking changes frequently as new models or versions appear.
As we might expect, it is more expensive to use the more capable larger LLMs compared to the smaller models. For closed-source LLMs accessible via an API, the pricing model is typically by the ‘token’ (a token is approximately 3 to 5 characters). For example, the price to use GPT-4 is $30 per million input tokens and $60 per million output tokens, while the comparable numbers for the smaller GPT-3.5-Turbo model are $0.50 and $1.50, respectively.27 Over time, LLM vendors have been steadily reducing the per-token prices to encourage adoption.28
Initially, there were only a few closed-source LLMs, but there has been a dramatic increase in the number of open-source LLMs in the past 12 months. Meta released its Llama-1, Llama-2, and Llama-3 family of models in February 2023, July 2023, and April 2024, respectively.29 Llama-2 was among the first open-source LLMs to narrow the gap in performance significantly relative to proprietary models. Since then, many other competitive open-source alternatives like Mistral have emerged. 30 We now have ‘open’ LLMs like Cohere’s Command R+ and Meta’s Llama-3-70b that are better than at least one GPT-4 class LLM, i.e., GPT-4-0613 (Figure 2).
With open-source LLMs, however, there are a number of important restrictions. Can the LLM be used for commercial and not just research purposes? Are the internal weights of the LLM available? Is the data used to train the LLM available? Is the training code available? The answers to these questions determine what is possible for building applications. Llama-2, in particular, is suitable for commercial purposes (albeit with some restrictions).31 Meta released it with code and instructions that made it possible to fine tune or adapt the LLM to meet the requirements of downstream applications. If the weights of the model are made available but the dataset and code used for training the model are not, we may refer to this as an ‘open-weights’ model rather than an open-source model. Many of the most popular open models fall into this open-weights category.
Building an LLM-powered application requires choosing between an open-source LLM or a closed-source LLM, and this involves weighing several important factors. Closed LLMs accessed via commercial APIs (like GPT-4) tend to be more accurate out-of-the-box for a broad range of tasks. They are also easier to use for organizations that do not have the necessary compute infrastructure and machine-learning expertise; knowing how to make API requests and process the responses is adequate. The simple per-token API pricing model makes it easy to get started without a significant upfront investment of effort and capital. Over time, though, the total cost can add up due to the need for iterative experimentation and prompt engineering during development. Closed LLMs also require sending proprietary data to third-party providers and are subject to model updates that may unpredictably impair performance of downstream applications. They also generally offer less flexibility for customization strategies like fine-tuning.
Open-weights models allow for full customization through techniques like fine-tuning and Reinforcement Learning through Human/AI Feedback. For specific, narrow use-cases, this customization flexibility makes it possible to fine-tune a smaller-size open LLM on high-quality proprietary data and achieve performance that is comparable to what a much larger LLM can achieve. This can be a significant advantage since smaller LLMs are less expensive to host and use and offer much better latency for real-time applications. Open models, if hosted internally, also enable an organization to ensure data security and insulate downstream applications from unpredictable model updates. But all these benefits come at a cost. Although open-source models are free to use, the costs of computing infrastructure, platform engineering, support, maintenance, and security can add up. Furthermore, open models require significant technical expertise to host, manage, and optimize.
While the closed versus open choice is an important one for the application developer, from the perspective of the enterprise as a whole, it will not be an either/or situation. It is likely that, as LLMs mature, the typical large enterprise will use a combination of a few closed large LLMs to support broad use-cases where a wide range of inputs are expected and many smaller open LLMs for narrow use-cases. Furthermore, even within the context of developing a single application, multiple LLMs may play a role with ‘Helper LLMs’ being used to create training data, evaluate and rank outputs, and support other tasks.32
LLMs differ from prior computing platforms such as operating systems in that they are, in a sense, applications themselves as well as foundations for building other applications. Of course, operating systems perform many specific functions, such as to boot up a computer or enable users to open and save files. But, unlike traditional operating systems, LLMs can perform new functions by simply prompting them the right way (e.g., “write code in English”) or by providing very minimal training. Many Natural Language Processing applications that previously required bespoke application development can now be built by prompting an LLM. For example, an LLM can be used as a text classifier for a variety of use-cases (e.g., routing a customer service request to the appropriate department, understanding the intent behind a search query, or classifying the sentiment of a product review) by just using different prompts.
This ‘instantaneous versatility’ is a defining characteristic of LLM platforms, and LLM vendors are exploiting this capability. OpenAI has made it possible for users to quickly create custom versions of ChatGPT tailored to specific use-cases—no coding required.33 In the two months after this feature became available, three million custom versions of ChatGPT were created by users. These bespoke LLM applications, created by nontechnical users as well as professional developers, are available at the GPT Store34 (reminiscent of Apple’s App Store).
Nvidia dominates the GPU business today because its widely-used proprietary CUDA platform (the API software for Nvidia GPUs) makes it difficult for application developers and data centers to switch away from its hardware.35 Even though open-source libraries such as PyTorch, TensorFlow, and JAX largely shield application developers from directly programming against CUDA, these libraries have tended to be optimized for, and work best with, CUDA. When alternatives are available, such as AMD’s open-source ROCm library (which is available, for instance, to PyTorch developers), these alternatives remain poorly documented and suffer from myriad other issues.36 In turn, kernel programmers (specialists who write the lower-level routines needed to access GPU instruction sets directly) also tend to focus their attention on CUDA so that the best-performing innovations are available there first, despite the fact that both AMD (ROCm) and Intel (SYCL) have made their respective GPU APIs open-source.
All of this leads to effective platform lock-in and a winner-take-most outcome for Nvidia hardware and software. That said, competition is gradually appearing and exploiting Nvidia weaknesses. For instance, CUDA programming uses C/C++, relatively difficult programing languages. As a countermeasure, OpenAI launched Triton as a Python-based alternative to GPU programming in 2021. Although Triton still requires a CUDA compiler, it avoids CUDA propriety libraries in favor of open-source alternatives. Future versions should run on Intel, AMD, and other GPU hardware.37 Then we have OpenCL, introduced in 2009 and based on C, as another important open-source framework for GPU programming, implemented on a wide variety of platforms.38
Overall, the ecosystem for GenAI application programmers is vibrant and increasingly open source. As discussed earlier, PyTorch, TensorFlow, and JAX are the mainstay libraries for application programmers. Still higher-level frameworks such as Keras are also extremely popular. In addition, a rich ecosystem has emerged to support specialized tasks ranging from experimentation (e.g., WANDB) to training (e.g., Lightning). Perhaps most interesting from the perspective of broader market impact are libraries such as the open-source transformers library.39 With 125 thousand stars on Github,40 this is among the most popular repositories on that platform. By providing a common interface to LLMs, and an associated hub on the Hugging Face website, with hundreds of supported ‘backend’ LLMs, this library allows application programmers to easily switch between the LLMs they use in their downstream applications. The result is to minimize lock-in to these models and ensure the rapid dissemination of LLM innovations.
Nvidia GPUs remain the dominant hardware platform for GenAI applications, and this dominance is reinforced by the cloud-computing providers, led by Amazon, Microsoft, and Google. They all support vast numbers of Nvidia GPUs in their data centers to run GenAI software. At the same time, though, the leading cloud providers and chip makers are developing their own GPUs. For example, AMD has targeted Nvidia’s popular H100 server with its MI300X line, specifically designed for GenAI computations and hosting.41 Intel in 2019 acquired Israel’s Habana Labs for $2 billion and then in 2022 introduced the Gaudi2 chip, which also targets Nvidia’s H100 and is particularly strong in inference processing.42 Start-ups such SambaNova and Cerebras have raised billions of dollars as well to design new generations of GPU platforms.43
Among the data-center providers, Google introduced its Tensor Processing Units (TPUs) for in-house use in 2016 and then began selling these chips in 2018. TPUs can only be used on Google Cloud.44 However, Google TPUs and the JAX library reportedly outperform Nvidia systems in some applications.45 AWS introduced the Trainium machine-learning accelerator in 2020, optimized for deep-learning training, with some software support.46 Microsoft released a custom AI chip for its data centers in late 2023.47 And Meta has an in-house GPU and supercomputer effort underway.48
Despite the increasing competition, we have a hard time seeing training workloads (i.e., the task of training GenAI models) moving meaningfully away from Nvidia-based GPUs in the short run. This is simply because the software ecosystem that supports training is so heavily reliant on Nvidia. On the other hand, the story for inference (the task of actually using a trained GenAI model) is different. These tasks require a much more limited set of capabilities that are widely supported across platforms. It is relatively simple for an application programmer to take a model trained on Nvidia GPUs and run inference for the same model on an AMD or Intel GPU, such as with the Hugging Face Optimum library.49 Inference workflows may eventually form the bulk of GenAI-related computing, so this is an important opening for Nvidia’s competitors.
There are further implications of the specialization afforded by inference workloads. For example, it might be possible to develop chips optimized exclusively for inference. A start-up, Groq, recently demonstrated massive inference speedups using its custom-built ‘LPU’ devices. On the recently released LLAMA3 model, this device afforded an approximately 3× speedup over the next highest reported inference times (which in turn relied on state-of-the-art software acceleration).50 It is easy to imagine new and established chip-makers both focusing on inference-optimized hardware. At the other end of the spectrum, it is already possible to run simpler models (either smaller models or quantized versions of more complex ones) without a GPU at all and instead use CPUs (with software from start-ups such as NeuralMagic) or mobile devices.
New platforms usually take off based on third-party applications. Horizontal applications are especially important to encourage broad usage. With GenAI, Microsoft and Google have already added LLMs to their search engines, enabling billions of people to access this technology for free and with ease.51 In May 2023, Microsoft also released plugins to connect OpenAI technology embedded in Microsoft 365 Copilot with business applications from various vendors.52 Other firms are doing the same. Plugins enable GenAI systems to access customer data and write reports or trigger actions in other programs. Meanwhile, many startups are introducing tools for text, image, audio, video, and code generation as well as chatbot design, search, data management, and machine learning.53
The application ecosystem is growing rapidly. Of the 2,011 companies in the 2024 Machine Learning, AI, and Data market map, 578 were new entrants.54 McKinsey estimates that 75% of the annual impact of GenAI will stem from automation of tasks in sales, marketing, engineering, and customer-operations functions.55 Consistent with this projection, the top focus area for start-ups in Y Combinator’s Winter 2023 batch was GenAI tools for engineering, followed by sales, customer support, and operations.56 Hundreds of start-ups are also building specific vertical applications that target a growing variety of industries and tasks. 57 These include manufacturing, gaming, fashion, retail, energy, healthcare, defense, finance, agriculture, physical infrastructure, education, media and entertainment, legal services, computer coding, mobility, and construction.
Earlier generations of foundation models trained on text scraped from public databases like Wikipedia and Common Crawl for the first ‘pre-training’ phase and human-created instruction–answer pairs for the ‘instruction tuning’ phase.58 Based on empirically observed correlations between model size, dataset size, training compute, and model capability (referred to as “scaling laws”59), researchers realized that, to increase model capability, they need to train bigger models for which larger datasets are needed for both phases. But where will these new datasets come from? While new text is continually added to the internet, there is concern that content generated by widely available foundation models may already be a significant fraction of the new text. Training the next generation of models on this text will potentially lead to a deterioration in capabilities. Furthermore, the widespread scraping of web data by LLM vendors has led to questions of copyright and fair use, with lawsuits being filed.60
In response, foundation model providers have expanded their data acquisition strategies in several ways.61 For pre-training, transcripts of podcasts, audiobooks, and YouTube videos appear to be a new source of previously untapped text data.62 Model providers are also establishing commercial agreements with media companies for exclusive access to their content.63 For instruction-tuning and model evaluation, the use of synthetic data is also rising.64
What should we worry most about when it comes to developing and using GenAI technology as we go forward? The regulatory and governance challenges are somewhat similar to what we have seen before, but the consequences may be more serious and more difficult to resolve.
Powerful network effects mean that we are likely to see a reduction in the number of competing LLMs as software developers choose the most popular or accessible models around which to build their applications. In addition, only a small number of companies seem to have the money to keep developing and training the foundation models as well as to fund GenAI as a cloud service. These factors would seem to ensure the continued dominance of the giant technology firms producing the most popular LLMs (Microsoft/OpenAI, Alphabet/Google, Meta) and GPUs (Nvidia).
The current set of dominant players may gradually change, however. Knowledge of how to build highly competitive LLMs with substantially less capital has begun to diffuse widely. As an example, the start-up Mistral has successfully demonstrated state-of-the-art performance using effectively smaller ‘mixture of expert’ models.65 Many of its open-source models are a mainstay on the LMSYS leaderboard.66 Furthermore, the evolution and very broad adoption of standard interfaces to these models (such as the open-source transformers library67 produced by Hugging Face) significantly minimizes lock-in for downstream application developers. These mitigating factors might well ensure a high degree of continued competition even among developers of foundation models, or at the very least the continued availability of high-performing, open-source LLMs. On the other hand, there is the risk of regulatory influence by large, well-funded LLM providers leading to concentration of market power: by successfully lobbying for government regulations that can only be satisfied by providers with substantial financial resources, the large players can drive out start-ups and open-source providers from the ecosystem.
We have encountered data privacy, bias, and content ownership issues with prior digital platforms for search and social media as well as AI/ML systems used in various applications, such as for screening resumes. For internet search, a US appeals court also ruled in 2008 that a few lines of text—but not more—was a “fair use” of copyrighted content.68 But what is “fair use” of training data and other content, such as videos or music files, for GenAI systems? This is a matter currently under litigation in the United States. The New York Times filed a lawsuit against OpenAI and Microsoft for copyright infringement and competition concerns after both companies used published news articles to train ChatGPT and Microsoft CoPilot. There are as well two class-action lawsuits filed against Google and other AI tool vendors.69 We do not yet know how or if the courts will resolve these issues.
Given a query, LLMs generate a response by repeatedly predicting a ‘plausible’ next token and feeding it back into the input. Such responses will almost always look right but will actually be correct only some fraction of the time. Incorrect responses that appear reasonable are called ‘hallucinations.’70 But they are just one type of unacceptable output. LLM responses can be irrelevant, factually wrong, logically flawed, or contain harmful or toxic content. The reliability of LLMs can be improved by training them to use external tools like calculators or code interpreters to answer certain categories of questions.71 Researchers are also investigating ways to increase the factuality of LLM responses, and there is progress.72 Furthermore, as the size of the LLMs increases, they tend to make fewer mistakes.73
Still, as of this writing, there are no guarantees of accuracy. As a result, developers and users of LLM-based applications need to have robust mechanisms in place to check LLM output and fix errors. Often, this checking is done by a ‘human in the loop.’ Not having such checks in place is risky, as is evidenced by recent, high-profile examples.74 Some companies also offer tools to help users detect instances of fake text, audio, and video.75 Yet, so far, it does not seem that these tools can reliably distinguish genuine from false text (or any other digital content).76
New, complex technologies generally require some combination of company self-regulation with government regulation, or at least the threat of government intervention.77 We see some movement toward government oversight, but we also see hesitation. In July 2023, the US Federal Trade Commission opened an investigation into ChatGPT’s inaccurate claims and data leaks.78 At the same time, the White House announced that Google, Amazon, Microsoft, Meta, OpenAI, Anthropic, and Inflection AI all agreed to allow independent security testing of their systems and data as well as to add digital watermarks to GenAI images, text, and video.79 There is as well the Content Authenticity Initiative, a consortium of 1,000 companies and other organizations that is trying to establish standards to help detect fake content.80
These are all positive steps, but they are unlikely to be sufficient.81 More government regulation or a combination of government and private-sector oversight will become necessary, especially if only a few companies continue to dominate key GenAI technologies. The process of formulating GenAI regulations necessarily needs to involve expertise from outside the government, but checks and balances need to be in place to mitigate the risk of regulatory capture by well-funded LLM providers. More competitors and ‘eyeballs’ with open-source LLMs also may reduce big-firm dominance and help expose technical or policy flaws. Nonetheless, there is concern that bad actors will have access to the open-source versions of GenAI technology.
GenAI applications may disrupt whole industries as well as individual occupations. For example, Google presently dominates internet search, but this may change. Microsoft CoPilot is a productivity assistant and search engine powered by OpenAI technology. There is also a completely new search engine built by the start-up Perplexity, with both a free and premium subscription model. Its search engine runs on Perplexity’s own version of the open-source LLAMA 2 LLM, originally developed at Meta. It lets users select other LLMs, such as from OpenAI (GPT 3.5 or 4.0) or Anthropic’s Claude.
Google may be a victim of its own success here: the company generates revenue whenever a user clicks on a sponsored link and will understandably be hesitant to changes that impact its revenue model. On the other hand, Perplexity users receive summarized answers to their questions with the corresponding sources presented as footnotes; early indications suggest that at least some users like this.82 And while it may not be better than Google Search at present, the new search engines offer some competition, both for search users and the multi-billion-dollar advertising business that goes along with searches and comprises most of Google’s revenues.83
The broader disruption likely will be to white collar jobs, impacting both employment and the content of work itself.84 There are some obvious areas like software development, where LLM-powered co-pilots are increasing the efficiency of software developers.85 Numerous other areas are likely to be impacted as well, ranging from high-skill jobs in the legal industry to relatively lower-skill, data-intensive jobs that arise across multiple industries. At a recent GenAI conference held at MIT, one panelist spoke of how their companies’ early partnership with OpenAI was allowing them to build software agents that performed many of the financial analysis tasks normally assigned to junior analysts. More generally, estimates on short-term employment impact vary widely but are large (12 to 100 million jobs).86 At the least, many occupations and activities (everyone from teachers, journalists, lawyers, travel agents, stock traders, actors, computer programmers, and corporate planners to military strategists) may find their jobs replaced, enhanced, or greatly altered.
Computing resources required for LLM training and then responses to each chatbot prompt are already huge. By some estimates, GenAI’s use of computing resources has been increasing exponentially for years, doubling every 6 to 10 months.87 Training GPT-3, for example, is estimated to have used about 1,300 megawatts hours of electricity, about the power that 130 US homes use in one year.88 Bigger LLMs can require more energy, so these numbers are likely to rise. Then there is the cost of the inference engines—such as the chatbots. The International Energy Agency estimates that data centers, cryptocurrencies, and AI usage accounted for almost 2 percent of global power usage in 2022 and would likely double by 2026, equivalent to the entire energy consumption of Japan.89 On the other hand, developers of the LLMs and applications are trying to make some of these systems more efficient as well as smaller. Many companies with data centers are also turning to renewable energy sources. Overall, the enormity of GenAI energy usage versus the benefits is a tradeoff that individual users as well as governments and companies need to consider.
A final concern is that no one knows where GenAI technology will lead us. This is not an unusual problem for society and organizations. It took several years for companies to realize productivity improvements from investments in computers. Initial data even indicated a drop in productivity because effective use of information technology required reductions or shifts in employee headcount as well as changes in work processes.90
We seem to be in a similar situation with GenAI. This new platform is likely to create innumerable opportunities for automation or semi-automation as well as for making white-collar work and some service jobs more efficient.91 Yet lasting productivity improvements may require reductions in personnel levels or shifts in personnel assignments with significant changes in work processes. We also need to figure in the cost of training and deploying LLMs as well as resolving quality problems such as hallucinations and inaccurate information. Smaller and more focused language models could change these cost–benefit calculations, though what is desirable for society is a question we cannot fully answer simply through estimates of economic gains.
As the accuracy of GenAI technology improves over time, the ‘temptation’ to automate a wider array of tasks increases. This may result in inappropriate use in high-stakes settings (e.g., autonomous cars) where the result of an error can be catastrophic (i.e., loss of life). Since it is not clear when or if we will ever have GenAI tools with zero-error guarantees, human users will be expected to intervene to avert disaster in such situations. As we have learned from experience, though, relying on human intervention may not prevent catastrophic outcomes.92
In conclusion, we are sure of one thing: now is the time for companies, universities, governments, and technology experts to think carefully and to think together. We all need to understand better the costs, benefits, tradeoffs, and potential dangers of GenAI as a new applications platform and transformative technology that is already shaping our common future.