An exploration of potential uses of artificial intelligence in live music performances: necessary implementation features for musicians and audiences, its virtuosic properties, and a new form of virtuosity arising from novel creative processes.
Recent advancements in generative AI have led to the development of powerful models capable of generating full-length, professional-sounding music from brief text prompts. These models have also sparked concerns among artists who feel their creative roles are being overshadowed. To address this issue, we propose the development of AI-augmented instruments, defined as generative AI systems embedded within musical instruments, that provide artists with extensive control and responsiveness to real-time musical inputs while harnessing the capability of powerful AI models. Through a thorough definition of virtuosity, we explore how these AI-augmented instruments can exhibit virtuosic qualities, serve as collaborative partners, and enable artists to attain a new form of virtuosity, which we call ‘symbiotic virtuosity.’ Furthermore, we present a set of guidelines for effectively communicating the AI’s capabilities to live audiences. To exemplify our reflections, we delve into our collaboration with Grammy-winning keyboardist Jordan Rudess, which resulted in a pioneering AI–human co-created demonstrative performance that was held at the MIT Media Lab in April 2024.
Keywords: artificial intelligence; music; live performance; virtuosity; improvisation
Author Disclosure: This project was made possible through grants from MIT and the MIT Center for the Arts, Science & Technology (CAST).
Lancelot Blanchard and Perry Naseck contributed equally.
Virtuosity has been a concept of interest in the context of Western music since at least the sixteenth century. The Oxford Dictionary of Music defines a virtuoso as “a performer of exceptional skill with particular reference to technical ability.”1 According to Devenish, Hope, and McAuliffe’s Contemporary Musical Virtuosities,2 this definition prevailed up to the twentieth century, when virtuosic performers were defined as possessing “exceptional technical proficiency, impressive dexterity, performance flair, and mass audience appeal.” Traditional representations of this definition include great performers such as Franz Liszt, Niccolò Paganini, or Arthur Rubinstein.
However, from the twentieth century, transformations in the way music was produced and consumed, together with growing criticisms of the elitist and traditional aspects of this definition of virtuosity, led to the emergence of new definitions. This sparked the development of a wide range of new models for musical virtuosity, with authors and scientists offering very diverse interpretations, attesting to its continued relevance. Among the many definitions collected in Contemporary Musical Virtuosities, we choose here to focus on the four listed below.
Virtuosity that arises through creativity, sonic exploration, and improvisation: This kind of virtuosity is especially evident through the advent of genres like jazz followed by avant-garde music, as suggested in Kaiser’s Improvising Technology: Constructing Virtuosity.3 We refer to this kind of virtuosity as “improvisational virtuosity”. George Lewis, a musicologist and specialist in machine improvisation, is particularly interested in the virtuosity that radiates from improvisation. In a conversation with Professor Tod Machover from the MIT Media Lab, called Why Do We Want Our Computers to Improvise?, Lewis further refers to improvisation as “a microcosm of social interaction that can provide insights into broader social and political issues.”4
Virtuosity that arises through collaboration rather than as an individual achievement: This virtuosity, defined as “collective virtuosity” by Mauskapf,5 characterizes performances that exhibit the skillful collaboration of multiple talented musicians. This is of particular interest for the study of music psychology, as it has been suggested that collaborative music stimulates two simultaneous cognitive processes in musicians: a subconscious process that focuses on the generation of coherent musical ideas that comply with the musician’s culture and education and a second more conscious process that focuses on the rewriting of the former in response to external musical stimuli from other musicians.6
Virtuosity that arises through the display of the artist’s vulnerability and fragility on stage: This virtuosity, which composer Molly Joyce refers to as “the new virtuosity,”7 is particularly important in the context of improvisational performances, where performers can captivate their audiences by pushing themselves out of their comfort zone and consistently putting themselves at risk.
Virtuosity that arises in the use of modern, digital technologies: Erwin refers to this kind of virtuosity as “cyborg virtuosity”8 and applies it in the context of new systems embedding Virtual Reality, robotics, and sensor technologies. With the rise of new AI technologies, we believe that this virtuosity is even more relevant now.
These definitions provide insight into some of the expectations that audiences might have when attending live music performances. Renowned artists generally display great expertise in more than one virtuosity, strengthening their image as skillful performers. Given generative AI models with increasingly powerful capabilities, we wonder whether some of these virtuosities might apply to these new systems. Additionally, we can study whether these new systems can help establish a new kind of virtuosity in the artists who use them.
As part of this project, our research team is collaborating with Grammy-winning keyboardist and technologist Jordan Rudess. Rudess’s renowned proficiency on the keyboard and his interest in modern technologies deem him an optimal candidate for our work. After months of exploration and research, we developed an initial prototype of a system that Rudess tried on stage during a live performance in April 2024. We describe our system and report the results of this performance in the following sections.
There has been a recent upsurge in the development of AI-augmented generative music systems that display impressive abilities to create musical content. Companies like Suno9 and Udio10 recently released models capable of generating full-length songs from short text prompts. These models quickly prompted an answer from the artistic community, who wrote an open letter to emphasize the risks of such releases on their artistic integrity.11 In light of this reaction, we examine the development of systems that can enhance the creative expression of artists rather than replace it. We introduce the concept of AI-augmented musical instruments, which we define as generative AI systems that can be played in real time by live performers. Based on the definitions of virtuosity that we considered in the previous section, we look at the technical aspects of AI-augmented instruments that might allow them to be virtuosic and explore how artists might develop a new kind of virtuosity by interacting with them.
Despite the impressive abilities of the aforementioned systems, their modalities of interaction with artists are sometimes unclear. Most recent AI systems for music generation use textual descriptions to condition their generative process, which is often an insufficient interface for artists. In order to be used as musical instruments, such systems need to embed a certain amount of control that can allow the artists to express themselves creatively and musically while benefiting from the full potential of the interface. This tension is particularly important in the field of Human–Computer Interaction, in which researchers look for systems that can exhibit both competency and ease of use.
Previous work in this area has been carried out intending to design better interfaces that allow musicians to use generative AI models in a creative and controllable way.12 However, we believe that there remains a need to properly define optimal interfaces that can allow artists to interact with generative music AI models, especially in the context of live music performances. We call such interfaces AI-augmented instruments, as they exhibit the controllability of musical instruments while enabling the powerful inference of the generative AI models that drive them. An AI-augmented instrument sits at the boundary between a human collaborator and a musical tool. Rather than acting as an autonomous generative AI agent, it possesses a wide range of controls that the artist can interact with, and it can listen to musical cues to use in its generation.
For our performance with Jordan Rudess, we developed a prototype live accompaniment system that allowed him to co-create complex musical pieces live. Centered around a solo keyboard performance, we enabled the AI-augmented instrument to receive Rudess’s musical data through the musical instrument digital interface (MIDI) protocol and generate appropriate basslines and chords to accompany his playing. We also built a light sculpture that visualized a two-dimensional projection of the notes generated by the instrument and allowed Rudess to interact with it in real-time. Figure 1 shows a diagram of the architecture used during the performance, and Figure 2 shows Jordan Rudess interacting with the instrument.
We now can examine what virtuosity means when describing AI-augmented instruments. Although models that can generate complex musical phrases with great accuracy might be technically impressive, they arguably will not surprise audiences to the same degree as their human counterparts. In fact, AI-generated music featuring speed, dexterity, and technicality does not necessarily impress audiences. Consequently, it appears that the most simple and traditional definition of virtuosity, as introduced at the start of Section 1, might not apply as well to AI-augmented instruments as it does to human performers. In this light, we take a closer look at other definitions of virtuosity and study how they can be actualized in AI-augmented instruments. Additionally, and for each definition that we consider, we describe how the AI-augmented instrument that we designed for Jordan Rudess technically tackled these aspects and offer a few guidelines to help develop better AI-augmented instruments in the future.
We previously defined improvisational virtuosity as the ability of a performer to devise creative musical ideas rapidly on stage. In the context of AI-augmented instruments, this skill should be more precisely defined as the ability to generate creative ideas that could resemble the artist’s style. AI-augmented systems such as Suno and Udio aim to be highly generalizable and fit various music preferences and users. While the ability to generalize is crucial for these models to create coherent musical content, we believe that it can sometimes detract from the personalized touch that audiences expect during live performances. This is even more true when these systems are embedded into live instruments, as the artists playing them generally require deliberate control of their outputs. As such, we believe that it is essential for AI-augmented instruments aimed at live performances to display adherence to the musical style and expressivity of the artists who interact with them. Ideally, our goal is to help artists feel a strong sense of connection to their instrument and to enable audiences to sense and empathize with this connection.
In recent literature surrounding machine learning, similar reflections have been a key aspect of research in Foundation Models.13 These models aim to efficiently model the distribution of a very large dataset in such a way that methods such as Transfer Learning or Fine-tuning can be used to apply them to other more specific tasks. As such, we believe that these models offer great potential for embedding improvisational virtuosity into AI-augmented instruments.
To put this idea into practice, we made use of the GPT-2–based Music Transformers14 trained by Thickstun et al.15 at the Center for Research on Foundation Models of Stanford. As well as implementing helpful mechanisms such as anticipation, these models were trained on over 150,000 MIDI files from the Lakh MIDI Dataset16 for 800,000 training steps. By themselves, these models can generate MIDI sequences that are musically coherent and accurate. However, they can be of little value when embedded as-is into instruments due to their slow inference, their inability to consider real-time data, and their lack of specificity. To resolve this issue, we defined a set of specific tasks that we expected our model to accurately perform and initiated a process of collecting data to fine-tune those foundational models. In particular, we fine-tuned the models on basslines, chords, and leads with data prerecorded by Rudess prior to the event.
AI-augmented instruments need to be able to listen to the performer’s playing and react appropriately to that playing in real time to be effectively employed in live music performances. Compared to standard generative AI music models, AI-augmented instruments can develop a better connection with human performers, which would enable them to display a sense of collective virtuosity. Because of this, AI-augmented instruments not only need to have precise and accurate outputs but also offer the low latency that we expect from traditional audio-based systems. Such an endeavor is no ordinary feat given the computational complexity of modern machine learning models, and specific techniques are required to enable these models for use in real-time contexts.
In the models we developed for Jordan Rudess’s improvisations, we employ a combination of different methods to ensure the system is as reactive as possible. We notably performed quantization17 on the fine-tuned models and designed an algorithm to dynamically adjust the duration and interarrival time of notes during the generation. We use this algorithm to limit the number of simultaneous notes early in the generation, enforcing a high throughput of notes from the start and, hence, ensuring that our model is always generating content ahead of playback time.
Such strategies are important to ensure that AI-augmented instruments generate content fast enough for live music environments. These instruments also need to acknowledge musical stimuli occurring live. As we mentioned in Section 1, music psychology suggests that musicians manipulate both a subconscious and a conscious cognitive process while improvising. To design effective AI-augmented instruments, it is necessary for them to mimic this human behavior. This means that AI-augmented instruments need to know when to regenerate new musical ideas and when to continue with the current generation. This could be achieved through parallel generations and by forcing AI models to attempt to predict the potential input of the human artist. By continuously preparing for multiple situations, the AI-augmented instrument knows which continuation to choose from its set of generations and when to generate new possibilities. We believe that the ability to plan far into the future and switch generations on a moment’s notice is crucial for AI-augmented instruments to display collective virtuosity.
In our prototype, we employed a simpler method that enabled Rudess to override the instrument’s decisions at any time. To do so, the model allowed him to trigger a regeneration of notes by playing a bass note on his keyboard. The bass notes played in Rudess’s left hand and leads played in his right hand set the key for the generated bassline and chords. In a later experiment that generated lead melodies, the model generated content coherent with the last chord that Rudess played.
In the same way that human performers can place themselves in a position of vulnerability on stage, it is important for AI-augmented instruments to show their fragility in order for audiences to recognize their virtuosity. Displaying the fact that AI-augmented instruments are susceptible to failure is even more crucial to distinguish them from systems that might feature prerecorded content. There are various ways in which this vulnerability might surface, notably by allowing time for the instrument to generate content by itself, letting the AI produce musical lines that extrapolate the player’s input in unusual ways, and allowing it to explore more unconventional musical spaces.
In our prototype, we allowed Jordan Rudess to interact with a responsive light sculpture designed to represent the thinking process of the AI-augmented instrument. Through this interaction, Rudess dynamically changed the temperature of the underlying generative AI models, which, by nature of the nucleus sampling process,18 allowed the instrument to explore more experimental generations that are more out of the training data distribution.
“With AI in the mix, the stage becomes a playground of sonic exploration. It's like having a musical co-conspirator that pushes boundaries and fuels spontaneity, allowing me to unleash my creativity in real-time.”
- Jordan Rudess
Having explored various aspects of virtuosity that AI-augmented instruments can display, we now shift our focus to the interaction between human performers and AI-augmented instruments. This section will examine the enhanced virtuosity that emerges from this novel symbiosis, highlighting its implications for the future of musical performance. George Lewis has been a notable advocate of using technology to enhance musical performance and has reported that “machines can push human creativity” in order for human performers to “develop new modes of creativity.”19
In this context, we identify a novel form of virtuosity that emerges from this novel collaboration: a virtuosity between the skillful interaction of human performers and AI-augmented instruments. We call this new virtuosity symbiotic virtuosity. This type of virtuosity is characterized by the augmentation of one’s own performance with AI-augmented instruments that can mirror, extend, and re-form the performer’s musical identity. This virtuosity is distinct from existing modes of collaboration, creating an individualized closed-loop system of personal growth: a performer may train a model, interact with it, and continually feed the resulting new music back into the model (Figure 3). This cyclical innovation stems from the AI tool acting as both a mirror and an extension of the artist’s existing abilities.
“AI’s capabilities might introduce novel musical elements or improvisations that I’ve never considered before. Is It possible that I’ll perform melodies and harmonies that are new to my repertoire?”
- Jordan Rudess
Moreover, this symbiotic relationship not only cultivates new forms of virtuosity but also enhances existing ones. Collaborating with an AI-augmented instrument can refine an artist’s collaborative skills. In traditional artistic collaborations, trust is built through personal connection and mutual recognition of skill. However, when interacting with an AI-augmented instrument, establishing this trust is challenging due to the potential unpredictability of the AI’s behavior, which could reflect poorly on the human performer’s reputation. Especially when this AI is trained on the musical data of the human performers, it exposes a unique relationship of accountability and mentorship similar to that between a mentor and a novice musician. After performing with our AI-augmented instrument prototype, Rudess confirmed our beliefs: “I’m training this young program, [and] I feel responsible as a mentor to my ‘young student.’” This scenario enhances the virtuositic element of risk for each human performer, elevating their role within the performance.
Having explored the multifaceted definitions of virtuosity and its manifestation in both human performers and AI-augmented instruments, we can now turn our attention to its realization on stage during live music performances. While human performers effortlessly communicate virtuosity through body movement, facial expressions, and interacting with the audience, this task presents a greater challenge for AI-augmented instruments that lack a physical presence. This section explores the importance of showcasing virtuosity to audiences and explores innovative ways for presenting AI-augmented instruments as autonomous performers.
The manner in which a performer exhibits their virtuosity onstage is as important as their musical ability. Similar to how experiencing a live performance differs markedly from listening to a recording, an AI-augmented instrument treated as a performer must demonstrate its autonomy and skill with nonmusical cues. William Forde Thompson, who studies how musicians’ visual representation affects performances, writes that human expressions “allow performers to cozy up to the audience, emphasizing the music performance as reciprocal human interaction, whereas an absence of visual information leaves an impression that the performance is a solitary act.”20 This suggests that live performances foster a communal experience between the performers and the audience. While developing performances featuring AI systems, we found that performative and expressive communication of expertise is especially important for audiences without extensive musical knowledge, who often rely on visual cues to gauge a performer’s skill and the complexity of the music.
“The lighting and visualizations feel like essential components in order to almost make up for the missing human factor.”
- Jordan Rudess
Our approach to presenting an AI performer as a distinct musical identity involves crafting time-sensitive visuals that delineate the AI’s contributions from those of human performers onstage. This strategy is not unique to AI; human performers also distinguish themselves through physical expression and instrumental performance. Indeed, Saldaña and Rosenblum demonstrated that visual cues accompanying sounds significantly influence an audience’s ability to differentiate sounds.21 According to Thompson, “visual information often signals the timing of musical events, focusing listeners’ attention to (or away from) critical acoustic information at specific moments in time. By directing attention in this way, visual cues can increase or decrease musical intelligibility.”22 Such visual cues are expected for human performers, suggesting a parallel necessity for AI performers to maintain comparable standards in live performances.
Musicians use a blend of their instruments, movements, and expressions to provide this indication. For example, a pianist may visibly shift along the bench to access different octaves, a violinist might intensify their bowing to increase volume, and a vocalist adjusts mouth movements and breathing patterns to vary sounds and dynamics. These movements specifically draw attention to a particular musician or instrument as a whole and not the entire composition.23 By emphasizing and visually expressing one sound or note, a performer builds the audience’s comprehension of how the layers of a multi-performer piece come together.
In contrast, without any visual indication, it is unclear where a certain sound or voice originates from. This voice could be played back statically or could be derived from another performer’s sound—such as a looper or arpeggiator. Therefore, it is crucial to visually represent changes in AI-generated music in real time.
To address this problem, we created a light sculpture that visualizes the harmonic space of the AI-augmented instrument’s performance, as shown in Figure 4. As the AI-augmented instrument generates and plays chords, these musical changes are captured in the sculpture. While the exact musical mapping of harmonic space to physical space remains abstract, the timely changes provide a distinct visual marker of the AI’s autonomous role on stage. This is visually signified by a bright red area moving across the sculpture in sync with each chord played by the AI, clearly distinguishing the AI’s contributions from those of human performers.
The pursuit of transparency introduces a need for sound reactivity in live visualizations. As the scales of concert productions soar to new levels, they often become increasingly scripted and sequenced.24 This extensive synchronization coordinates the musicians, dancers, backing tracks, lighting, graphics, moving scenery and rigging, pyrotechnics, and more, ensuring reliability and consistency across numerous performances and venues. However, this structured approach limits the improvisational freedom for the performers, confining spontaneous musical elements to precise, predetermined segments within tightly controlled durations, tempos, visual mapping, and stage directions.
Given that AI-augmented instruments inherently introduce an improvisational element to the stage, this could prompt a shift toward new modes of reactivity. Integrating an AI-augmented instrument into a highly scripted setup restricts its ability to fully express its virtuosity. Instead, the supporting systems should be designed to adapt and respond to the spontaneous interplay between human musicians and AI-augmented instruments.
To accommodate this, newly-constructed inputs will be required to allow live visual systems to fully capture the musicality of improvisation. This might involve moving away from fixed visual patterns to those that evolve dynamically in response to the live music. Although systems to derive visualizations (such as lighting) from real-time audio features were established decades ago,25 these systems are not as common in scripted and synchronized performances today. Traditional noise patterns and random number generators could be replaced by pseudorandom processes that synchronize with the music in real time. Distinct from existing systems of real-time audio visualization, these systems will input conceptual features of the music beyond the expected tempo, pitch, and volume components. New parameters such as energy, anticipation, complexity, and emotion could be introduced to enrich each improvisational performance. All of these new parameters will still require production designers to creatively program how their systems perform, but these systems may now move and react in adherence to improvised performances.
Furthermore, production designers will take on the role of visualizing the AI-augmented instrument’s presence as a performer. Professionals overseeing lighting, graphics, and scenic elements will need to conceive how the AI manifests itself in each performance. They will map these new parameters—specifically the parameters generated solely from the AI performer’s music—to uniquely characterize and distinguish the AI on stage.
For effective AI live performance, demonstrating expert proficiency is crucial. As previously defined, virtuosity notably requires a combination of creativity, collaboration, vulnerability, and technological expertise. Human performers demonstrate their musical competence through their instruments, physical ability, and improvisational skills. However, AI lacks a tangible instrument, necessitating additional indications of skill. Moreover, given that AI-augmented instruments are inherently capable of improvisation and music generation, it is essential to effectively communicate these capabilities to the audience to affirm their virtuosic qualities.
One of the virtuosic traits of an AI-augmented instrument is its ability to plan extensively into the future and modify that plan instantaneously. Yet, without a physical manifestation of this capability, the audience remains unaware of its existence. Thus, visualizations of AI virtuosity must effectively portray its reactivity and extensive planning to the audience and other performers onstage.
This portrayal involves visualizing the music in the time domain before it is played. Traditional live performance visualizations either rely on known futures (i.e., extensively preprogrammed sequences or knowledge from the visualization’s operator) or react only to the immediate notes and rhythms. The simplest and most direct visualization of the AI’s future is the generated bars of sheet music, but this lacks visual appeal and fails to engage nonmusicians. It also diminishes the element of surprise, replacing anticipation with predictable expectation. Additionally, animatronic AIs that play traditional instruments do not meet this future-music visualization requirement, as they can only actuate notes at the moment of sound production. Instead, new forms of musical visualization that can both respond to improvisation and project future musical developments are needed.
We propose incorporating dynamic stage movements to illustrate the AI-generated musical direction. This could be effectively achieved with kinetic sculptures that not only depict the planned future but also adapt to changes in response to live interactions with human performers. Such sculptures could incorporate a range from subtle motions for softer notes to vigorous movements for powerful crescendos. Importantly, these kinetic systems can suggest forthcoming musical changes without revealing their precise execution, like a gradual ripple that intensifies into a note burst. The controlled nature and trajectory of these movements can create a consistent visual pattern, with parameters like path, direction, speed, and amplitude that evolve as the AI anticipates future musical elements. The audience may not know when the note will land, but where the note travels and how it changes before it lands suggests a planned future. Audiences will learn to anticipate both continuations of current musical thought and dramatic musical shifts. However, since the AI-augmented instrument is constantly adjusting its planned future, these anticipatory movements will also change in real time to indicate rewritten futures. The frequency and intensity of these changes indicate the AI’s responsiveness to musical inputs from human collaborators, ranging from gentle adaptations to swift reconfigurations matching rapid musical shifts.
This visualization approach not only demonstrates an AI-augmented instrument’s reactivity but also exposes its vulnerabilities. If a note is wrong or a cue from another performer is missed, it becomes visible and audible, highlighting the AI’s error. A clearly planned yet incorrect note or sequence that did not react properly will still follow the same visual patterns and musical mapping, exposing what went wrong. In this case, the visual aid serves to emphasize the incorrect step and confirm to audiences that they hear something disjointed. This fragility creates a unique authenticity of the live generation of the AI, mirroring the potential for mistakes inherent in human performance.
Without this visual reactivity, any perfectly timed, unchanging display might seem preprogrammed to the audience. Integrating a stream of changes into this choreography exposes many facets of the AI’s virtuosity: its improvisational ability, collaborative capacity, unique reactivity, and inherent vulnerability.
The dynamic systems we employ also play a crucial role in providing feedback to performers onstage. While musical cues are explicit in guiding performance directions, the more subtle dynamics of anticipation and the energy of impending musical events are essential for effective collaboration. Similar to how human performers use movements, nods, and eye contact to communicate,26 these actions also demonstrate to the audience the ensemble’s cohesive musical interaction. Some of these human movements have been emulated in robotic performance systems, such as in Guy Hoffman’s and Gil Weinberg’s emotive marimba-playing robot.27 However, as previously mentioned, animatronic instrument-playing performers do not fully demonstrate the ability of an AI-augmented instrument.
AI-augmented instruments must also actively engage in this performative collaboration—both by asserting their musical intentions and by responding to the non-musical cues onstage. Performers may interact with a kinetically mapped AI sculpture similar to how they respond to human musicians’ cues. Quick movements within the periphery produced by the kinetic sculpture can command attention in the same way nods, gestures, and instrumental actions do among human performers. The animations chosen to represent the AI’s planned musical sequences not only display its intentions but also serve as a communicative tool with other musicians. This is crucial for fostering a unified performance experience onstage; it allows human performers to understand and adapt to the AI’s operational logic, treating it more as a cooperative entity rather than a static, unyielding tool.
Additionally, technologies such as computer vision and other sensor systems can be used to facilitate interaction with the AI-augmented instrument. Performers may imitate movements by the AI—such as swaying in sync—to affirm its directional cues. Alternatively, they may signal changes in tempo, energy, or octave by deliberately diverging from their synchronization with the sculpture. Or, they may use traditional methods of emphasis, such as leaning into a note, to naturally signify their focus on a particular musical element. This reciprocal communication completes the loop necessary for performers to integrate the AI as an extension of their own performative musical expression.
Two-way communication extends beyond musical cues to foster a genuinely shared performance experience. Without this layer of interaction, collaborations between human musicians and AI might lack nuanced individual expressions that characterize each performer’s unique contributions. Both human and AI performers should have the freedom to initiate solos, engage in dialogues, and eventually converge into a harmonious synthesis of their musical expression. For audiences, witnessing this interaction not only highlights the collaborative virtuosity involved but also illuminates the symbiotic virtuosity that emerges when performers engage deeply with an AI instrument trained on their own musical expertise, showcasing an intense, interactive relationship directly to the audience.
The physical embodiment of AI-augmented instruments must appear cohesive onstage. Visualization methods must first fit into the environment of a live performance before the AI-based mapping may be applied. In a discussion regarding this topic with Ben Bloomberg, a Grammy-nominated innovator working in live music production, he concurs with this sentiment:
“A piece onstage must be beautiful and awe-inspiring on its own before working in additional meaning.”
- Ben Bloomberg
Ideally, the same visual characteristics used by an AI-augmented instrument should naturally map any improvisational musical arrangement, albeit without the same level of control provided by the AI. A system of expression for an AI-augmented instrument built on a crude or contrived system of communication, construction, or scale will be intrusive to audiences and appear ill-fitting or unexpected, barring them from exploring further meaning.
Despite our strong belief that generative AI systems can greatly benefit the future of live music performances, as demonstrated in this paper, we recognize the risk that these systems can pose if applied carelessly. In particular, and as mentioned in Section 2, there is a legitimate concern that generative music AI systems, with the ability to generate professional-sounding music very rapidly, might endanger the integrity of professional musicians and songwriters. However, we believe that this risk can be effectively mitigated by integrating such systems into AI-augmented instruments and by delegating most of the control to the artist. We also note that we do not advocate for all live music performances to feature AI-augmented instruments, as we expect that only specific genres and/or audiences can react positively to this type of technology. Danielle Rudess, an experienced theater and music producer helping to shape the vision of this project, also raises this concern:
“Audiences’ reactions to live AI performances are going to vary widely, from fascination and awe at the technological prowess, to skepticism and concern about the implications for human creativity and employment.”
- Danielle Rudess
Under our rubric, live performances must always feature human performers, since AI-augmented instruments, as their name suggests, are intended to be played on stage. Finally, we emphasize that consent should always be given by the artist prior to fine-tuning models on their data. When possible, AI-augmented instruments should initially be played by the artists they were trained on, a requirement for symbiotic virtuosity. In any case, these models may only be distributed with consent of the artist, potentially with specific allowed circumstances in mind.
Through this exploration, we sourced from previous literature in musical psychology to define the complex notion of virtuosity in human performers and saw how this definition has evolved over time. By introducing the concept of AI-augmented instruments, defined as generative music AI systems that can be played in real time by human performers during live music performances, we argued that such systems can already display some virtuosity, according to our definition. As a result, we drew a map of our expectations for the development of future live music performances that can embed these AI-augmented instruments and how they might be transformed by this technology. Using the example of the performance we created with Jordan Rudess in April 2024, we show how a new kind of symbiotic virtuosity originates from interactions between artists and AI-augmented instruments and how audiences might react to this technological advancement. Finally, we explored the challenges of bringing an AI-augmented instrument onstage. We proposed how to effectively create physical embodiments of an AI performer that communicates with musicians and reaches all audiences. We specifically examined how AI-augmented instruments may demonstrate new forms of virtuosity to audiences and how they will technically shape the future of live music productions. We look forward to future explorations and developments of AI-augmented instruments for live performances, how they are integrated into existing production methods, and the new musical and performative spaces they will unlock.
The authors would like to thank Jordan Rudess for his kind and enthusiastic collaboration in this project as well as Danielle Rudess for her sustained help and wisdom. This project also would not have been possible without the constant support of the MIT Center for the Arts, Science & Technology (CAST) and, in particular, Lydia Brosnahan, who helped coordinate Jordan Rudess’s artistic residency. Finally, we would like to thank Ben Bloomberg for providing us with very helpful feedback and MIT for funding the research surrounding this project.
Bommasani, Rishi, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, et al. “On the Opportunities and Risks of Foundation Models.” Preprint, revised July 12, 2022. https://doi.org/10.48550/arXiv.2108.07258.
Devenish, Louise, Cat Hope, and Sam McAuliffe. “Contemporary Musical Virtuosities.” In Contemporary Musical Virtuosities, edited by Louise Devenish and Cat Hope. Routledge, 2023: 1–13.
Erwin, Max. “Wet Hot Kranichstein Summer: Darmstadt 2016.” Tempo 71, no. 279 (2017): 87–91.
Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. “GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” Preprint, revised March 22, 2023. http://arxiv.org/abs/2210.17323.
Hoffman, Guy, and Gil Weinberg. “Interactive Improvisation with a Robotic Marimba Player.” In Musical Robots and Interactive Multimodal Systems, edited by Jorge Solis and Kia Ng, 233–51. Berlin, Heidelberg: Springer, 2011. https://doi.org/10.1007/978-3-642-22291-7_14.
Holtzman, Ari, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. “The Curious Case of Neural Text Degeneration.” Preprint, revised February 14, 2020. https://doi.org/10.48550/arXiv.1904.09751.
Hoste, Lode, and Beat Signer. “Expressive Control of Indirect Augmented Reality During Live Music Performances.” In Proceedings of the International Conference on New Interfaces for Musical Expression, 13–18. Zenodo, 2013. https://doi.org/10.5281/zenodo.1178558.
Huang, Cheng-Zhi Anna, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. “Music Transformer.” Preprint, revised December 12, 2018. https://doi.org/10.48550/arXiv.1809.04281.
Joyce, Molly. “Strength in Vulnerability.” Interview by Frank J. Oteri, New Music USA, February 1, 2020. https://newmusicusa.org/nmbx/molly-joyce-strength-in-vulnerabilty/
Kaiser, Jeff. “Improvising Technology: Constructing Virtuosity.” Cuadernos de Música, Artes Visuales y Artes Escénicas 13, no. 2 (July 6, 2018): 87–96. https://doi.org/10.11144/javeriana.mavae13-2.itcv.
Kennedy, Joyce, Michael Kennedy, and Tim Rutherford-Johnson, eds. “virtuoso.” In The Oxford Dictionary of Music. Oxford University Press, 2013. https://www.oxfordreference.com/display/10.1093/acref/9780199578108.001.0001/acref-9780199578108-e-9552?rskey=0dDatw&result=9436.
Lewis, George E. “Why Do We Want Our Computers to Improvise?” In The Oxford Handbook of Algorithmic Music, edited by Roger T. Dean and Alex McLean, Vol. 1. Oxford University Press, 2018. https://doi.org/10.1093/oxfordhb/9780190226992.013.29.
Louie, Ryan, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J. Cai. “Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models.” In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–13. New York, NY, USA: Association for Computing Machinery, 2020. https://doi.org/10.1145/3313831.3376739.
Mauskapf, Michael. “Collective Virtuosity in Bartók’s Concerto for Orchestra.” Journal of Musicological Research 30, no. 4 (October 2011): 267–96. https://doi.org/10.1080/01411896.2011.614167.
Meyer, Daniel. “Build the ‘Sonolite.’” Popular Electronics Vol. 28, no. 5 (May 1968): 27–30.
Norgaard, Martin. “The Interplay between Conscious and Subconscious Processes during Expert Musical Improvisation.” In Music and Consciousness 2: Worlds, Practices, Modalities, edited by Ruth Herbert, David Clarke, and Eric Clarke, 187–99. Oxford University Press, 2019. https://doi.org/10.1093/oso/9780198804352.003.0011.
Raffel, Colin. “Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching.” PhD diss., Columbia University, 2016.
Rys, Dan. “Billie Eilish, Pearl Jam, Nicki Minaj Among 200 Artists Calling for Responsible AI Music Practices.” Billboard, April 2, 2024. https://www.billboard.com/business/tech/open-letter-ai-music-signed-billie-eilish-pearl-jam-nicki-minaj-1235647311/.
Saldaña, Helena M., and Lawrence D. Rosenblum. “Visual Influences on Auditory Pluck and Bow Judgments.” Perception & Psychophysics 54, no. 3 (May 1, 1993): 406–16. https://doi.org/10.3758/BF03205276.
Suno. “Suno.” Accessed May 15, 2024. https://suno.com/.
Thickstun, John, David Hall, Chris Donahue, and Percy Liang. “Anticipatory Music Transformer.” Preprint, submitted June 14, 2023. https://doi.org/10.48550/arXiv.2306.08620.
Thompson, William Forde, Phil Graham, and Frank A. Russo. “Seeing Music Performance: Visual Influences on Perception and Experience.” Semiotica 156 (2005): 203–27.
Udio. “Udio.” Accessed May 15, 2024. https://udio.com.
Williamon, Aaron, and Jane W. Davidson. “Exploring Co-Performer Communication.” Musicae Scientiae 6, no. 1 (March 1, 2002): 53–72. https://doi.org/10.1177/102986490200600103.