Skip to main content
SearchLoginLogin or Signup

AI for Musical Discovery

How Generative AI can nurture human creativity, learning, and community in music.

Published onMar 27, 2024
AI for Musical Discovery
·

Abstract

What role should generative AI technology play in music? Long before recent advances, similar questions have been pondered without definitive answers. We argue that the true potential of generative AI lies in cultivating musical discovery, expanding our individual and collective musical horizons. We outline a vision for systems that nurture human creativity, learning, and community. To contend with the richness of music in such contexts, we believe machines will need a kind of musical common sense comprising structural, emotional, and sociocultural factors. Such capabilities characterize human intuitive musicality, but go beyond what current techniques or datasets address. We discuss possible models and strategies for developing new discovery-focused musical tools, drawing on past and ongoing work in our research group ranging from the individual to the community scale. We present this article as an invitation to collectively explore the exciting frontier of AI for musical discovery.

🎧 Listen to this article

1. Introduction

Music has been a wondrous laboratory for creativity, learning, and community1 throughout human history. Despite this enduring influence, music's form is anything but static: each era and culture develops distinct musical forms shaped by their values, sociopolitical contexts, intricate structural logics, and personal narratives. When technology is thoughtfully leveraged, it can profoundly magnify music's reach and widen creative participation. Advances in recording, computing, and networking over the last century have underscored this potential.

Similar in spirit to all technologies, generative AI technology offers the potential to advance music broadly. This particular techology offers some exciting and important novel avenues for doing that which we outline here. Yet, as these systems advance, we must deliberate on the objectives behind their musical applications. Rather than merely imitating past conventions, how might AI push boundaries and reveal new insights? What novel interfaces could enable more people to develop musical abilities? Broadly, how can we apply these technologies to enrich human music-making? With care, we believe AI can inspire and amplify creativity rather than constrain it.

This article argues that the primary aim for generative AI technology in music should be to nurture human creativity, learning, and community across all skill levels. We propose musical discovery as a guiding concept—encompassing not just novel artifacts, but fresh perspectives that deepen understanding and broaden participation. Advancing this vision requires interdisciplinary efforts, from technical innovations to nuanced applications. If developed collaboratively under this humanistic lens, generative AI technology has immense potential to inspire new musical ideas that profoundly expand the universal human pursuit of discovery.

2. On Human Musical Discovery

A number of psychological theories account for musicians’ personal motivation to discover new ideas, beyond those that one assimilates early in musical development. For example, Csikszentmihalyi’s concept of flow2 emphasizes the importance of a challenge, which novelty can introduce. Even more diverse are the strategies used to find and pursue novel ideas. Beyond external influences, musicians routinely engage in solo exploration by improvisation, studio craft involving technological tools, and experimentation in composition.3 Collaborations between musicians stimulate new ideas by encouraging them to integrate contrasting backgrounds and concepts, leading to surprising combinations and translating existing ideas into new musical domains. Presentational aspects of music, like performing to a crowd, encourage feedback and social dialogue involving a broader array of community members, prompting iteration and integration of new perspectives. Educators also introduce new ideas to musicians, scaffolding their discovery. Ultimately, each discovery, however small, can unlock new expressive modes, enhance technical skills, deepen understanding of musical traditions, and simply invite delight at expanded possibilities.

At a historical level, musicians experiment with fundamental musical elements and concepts. In the Western world, for instance, this can include aspects like form, harmony, rhythm, and instrumentation.4 Music has also been theorized to evolve alongside broader sociocultural forces5 and empirically shown to transform through social transmission.6 The impact of historical-scale musical discovery is multifaceted. It can lead to the creation of entirely new musical styles, enriching the tapestry of human expression. Ultimately, musical discovery expands our understanding of the art form and its potential, while using music as a test-bed for exploring ideas and moving to action.7

3. The State of AI in Music

Early generative approaches focused on modeling music as a sequence of discrete symbols of musical events, represented as notes or MIDI (Musical Instrument Digital Interface),8 an approach that has carried into contemporary research.9 The last decade of advancements in modeling long sequences has enabled us to model music as a sequence of raw audio samples10—capturing the detailed nuances of music like timbre, human performance, production, and recording artifacts. Further advancements in higher audio quality, long-term structure, and consistency have led to commercial generative AI music services11 gaining traction. Despite this, a major limitation continues to be the missing agency and control over generated musical outputs.

Fueled by an aggregation of large-scale datasets for music,12 we are now faced with a myriad of foundation models13 for music.14 Such models show an impressive ability for pastiche but are highly restrictive in their limited musical diversity, textual conditioning, poor extrapolations, and missing provenance. The desire for interpretable and controllable models can be supported by a number of research developments15 and bespoke generative AI experiments by musicians.16 In summary, AI music generation is both historically rich and rapidly evolving, with impressive progress in symbolic and raw audio generation, foundation models, and interpretable approaches. However, limitations remain in agency, control, diversity, and provenance. Addressing these limitations will be crucial for unlocking the full potential of AI in music.

4. Developing Musical "Common Sense" and Long-Term AI Progress

Despite impressive pattern recognition and generation, modern AI systems still lack the "common sense" understanding of the world that comes naturally to humans. This is evident across domains like language, vision, robotics, and music. In AI research, "common sense" refers to the ability to reason intuitively about everyday situations depending upon implicit knowledge about how the world works,17 including aspects like intuitive physics and psychology.18

In music, we argue this involves recognizing and manipulating the intricate structures, semantics, and aesthetics that form the fabric of musical expression. Such musical intuition is difficult to capture through explicit datasets or training objectives. Rather, many aspects of music emerge through implicit learning processes.19 We call it "common sense" because it reflects shared assumptions and sensibilities within real-world musical expression, acquired through a complex interplay of biological, psychological, and cultural processes. Developing this level of comprehension remains a grand challenge for AI in music, and indeed music has been argued to provide deep challenges for AI development more broadly.20

Much has been written about human musicality,21 a complex and even contested22 notion that can be seen as addressing implicit musical abilities, similar to what we call musical common sense. In humans, the notion of musicality is entangled with questions of talent vs. skill, ability and aptitude, cultural universality, and species-specificity. We aim to distinguish our notion of musical common sense from musicality. One reason for this is that in order to encourage technical progress in research, we must align sufficiently on the capabilities we hope to develop. The other is that we do not aim to replicate all facets of human musical intelligence in AI systems. Rather, we hope to cultivate the aspects of musical comprehension that allow AI to most effectively enhance human creativity, through musical discovery.

We propose the following categories of capabilities as layers of musical common sense; parts that, though inevitably incomplete in capturing all human musical behavior, enable developing clearer goalposts for future progress.

Structural Attributes

This involves recognizing and manipulating the fundamental patterns, idioms, and theoretical constructs that form the building blocks of music in different stylistic, cultural, and social contexts. Expert musicians fluently apply such conceptual understanding when communicating ideas and intentions. Structural knowledge also aids music educators in conveying concepts to students, both to transmit knowledge of the past and to offer building blocks that students can use to generalize and extend past ideas.

For example, in the context of jazz, this could include chords and extensions, harmonic substitutions, canonical rhythms, higher-level notions like progressions (e.g., “rhythm changes”), sections, and standards. This conceptual understanding could allow an AI system to support a jazz musician in various ways. During practice, the system could generate variations and reharmonizations on chord changes to standard tunes to help expand the musician's harmonic vocabulary. In live performance, it could listen and respond with expected or challenging accompaniment. For analysis, the system could identify key patterns and structures in improvised solos to elucidate techniques. For jazz composers, common forms often rely on knowledge of both repertoire and harmonic concepts, such as contrafacts and reharmonizations.

Imagine, for instance, a musically knowledgeable multimodal foundation model. A novice might query such a model with a vague but intuitive textual description or auditory example of a musical idea, and the model would respond by identifying relevant theoretical constructs, retrieving examples from the literature and synthesizing new ones, and offering application ideas in order to help the learner build a meaningful mental model of the underlying concept. For an intermediate student, the system could generate reharmonizations and stylistic variations on a standard. This provides material to practice improvisation in novel and diverse contexts. For an expert performer, the system allows specifying ideas in precise musical language and iterating to rapidly explore new ideas. For example, a saxophonist could explore reharmonization concepts for a standard under different ensemble configurations by generating examples building on their intuition. A meaningful exchange in such a scenario is predicated on shared structural comprehension; the model must be able to represent and manipulate the expert’s ideas accurately and fluidly. In each case, this layer of musical common sense allows the model to build on what the musician knows and can convey to eventually reach new territory for educational or creative goals. As musicologist Paul Berliner writes, “There is… a lifetime of preparation and knowledge behind every idea that an improviser performs.”23

It is essential, however, to maintain humility about the fluidity and subjectivity of such notions. Musical knowledge resists over-codification, as conventions evolve dynamically across cultures and eras based on myriad factors, and are personalized based on individual experience, references, and context. What is considered standard in one generation may be cast aside in the next, and structural models of musical information often only crystallize in retrospect (e.g., through significant musicological efforts). For instance, the well-known sonata form exhibits considerable heterogeneity.24 We must also acknowledge the inherent limitations in formally encoding creative human practices like music.

As such, we should be wary of overreliance on explicit idioms in building and evaluating generative AI for musical discovery, and instead seek to perceive and participate in music’s ever-changing landscape with openness and nuance. The priority should be conveying possibilities in the musician’s own terms, not imposing assumptions. Consider the “Beginner’s Mind” or shoshin, an idea with its roots in Zen Buddhism. Suzuki writes “In the beginner's mind there are many possibilities; in the expert's mind there are few.”25

Emotional Context

Disparate theories account for how emotion is expressed through, perceived in, and induced by music.26 While academic discourse on the topic continues, musicians effortlessly intuit music–emotion relations. Composers and songwriters learn associations between musical devices and emotional states within their style and culture, often without explicitly reasoning about these associations. Performers even make subtle adjustments to phrasing, articulation, and expression to evoke varied affective responses.

Machines are usually taught to connect music and emotion through explicit tasks like music emotion recognition (MER),27 or implicitly through aligning music with affectively valent textual28 or visual29 correlates. However, these methods are unlikely to fully capture the nuance involved in musical emotion. MER often depends on datasets and prediction targets derived from simplistic taxonomies.30 Implicit learning from textual associations may instill biases from datasets, due to limitations in how emotion is often discussed and the need for more information than language alone for representing rich emotional concepts.31 Such techniques lack grounding in the human experiences, embodiment, and enculturation that gives rise to musical emotional fluency, and may implicitly encode biases in music–emotion connections.

Progress could require models that learn holistic musical emotion understanding through real-world immersion, or simulations and other experiential strategies. As one application example, imagine a system assisting a film composer in exploring ideas for a score. Generating plausible and compelling ideas requires an implicit understanding of the precise emotional arcs, aligned to scenes. A system could suggest musical ideas with knowledge of this context, and even ideas that thoughtfully deviate from it (for instance, to foreshadow future events in earlier scenes32), supported by a rich model of emotional context. This capacity for emotional insight is key to AI that can meaningfully collaborate in human musical communication, and stretch creators in new affective directions.

Interaction Dynamics

Human musicians communicate through an unspoken language of musical cues.33 In classical ensembles, quiet signaling enables almost inhuman feats of coordination and results in the synergistic performances we are accustomed to as audiences, from ad hoc duos to conducted orchestras. In jazz, players cue solos, accompaniment, and transitions seamlessly, displaying complex decision-making in real-time. Sensitivity to such social signals facilitates participation in music.

However, current AI systems lack awareness of such nuanced musical interaction. As Browning and LeCun note, "social customs and rituals can convey all kinds of skills to the next generation through imitation."34 To collaborate meaningfully with creators, AI must appreciate the social dynamics of music.

Progress in this area may require interactive environments where systems learn subtleties experientially. Evaluation metrics should assess musical-social intelligence beyond technical ability. Musical collaboration relies on tacit knowledge, and so such social competence is critical for AI that aims to enhance creativity through interaction on human terms, rather than replace it through automation.

Adaptivity and Personalized Behavior

In prolonged musical interactions, AI assistants must learn to adapt contributions to complement individual creators. User-adaptivity is a classic goal in computing systems.35 In language modeling, this goal has been bridged with modern foundation models by leveraging techniques like in-context learning and prompt engineering. For instance, OpenAI’s ChatGPT interface allows setting “Custom Instructions”36 that allow long-term consistency, and users may prompt within sessions to bias behavior toward personal desires as they change over time or respond to exogenous factors.

However, as Glassman recently described, human–AI interaction involves complex loops of intent formation, expression, inference, action, verification, and updating;37 in light of this, adaptation to users and goals from simple strategies like prompts may not be straightforward. Additionally, Picard proposed years ago that learning user subjectivity requires establishing shared "common sense” specific to the user, and then observation and learning over time.38

For music, we propose that personalization involves technical capabilities like recognizing preferred rhythms, motifs, emotional tones, and other artistic factors, but also entails detecting strengths, weaknesses, tendencies, and gaps. The goal over time is creative growth through personalized scaffolding; whether expanding the user’s skillset, their output, or simply keeping track of their musicianship as it changes.

This requires architectures that accumulate rich user models, responsive to both immediate and longitudinal patterns in individual creative expression; akin to what Bickmore and Picard39 once described as relational agents in a more general setting. In this way, AI systems can complement, challenge, and empower human creators while retaining their unique voices.

Cultural Sensitivity

Music poses a profound challenge for cultural understanding in AI. Musical conventions and aesthetics vary dramatically across the world's cultures, which each carry unique symbolic meaning and social significance. Riedl discusses the goal of machine enculturation,40 describing this as “the teaching of sociocultural values to machines,” and proposes a way to accomplish this: through stories, which often implicitly encode values and tacit sociocultural knowledge. Finding strategies that similarly implicitly convey such values in the context of music, and supplement narratives like stories, could be helpful toward this goal.

Another important aspect is training data. Training data often encodes implicit biases,41 so achieving culturally sensitive AI requires intentional efforts to improve representation; for instance, implementing ethical data sourcing and sampling strategies, involving community members for evaluation and feedback, and using technical measures to reduce imbalances where possible. Ultimately, datasets are insufficient without participation from people to instill nuanced comprehension. We expect progress to come through partnerships with cultural communities, where human guidance and validation steer systems away from bias and toward genuine sensitivity.

Moreover, granting cultural communities authority over their musical representation is imperative to avoid misinterpretation and appropriation by AI systems. By incorporating these strategies, AI can progress toward genuine cultural sensitivity, understanding cultures as complex, evolving entities rather than static sets of traits.

5. Extrapolating Beyond Today's Sounds

While generative models have achieved impressive results emulating existing styles, moving beyond today's musical horizons presents acute challenges. Definitionally, today’s models are trained on yesterday’s data; this is what makes them so fluent at re-creating the past. Yet relying on imitation alone risks stagnation. How then can we grow new sounds?

Embracing Uncertainty

Modern generative AI models have extraordinary imitative abilities, yet often err in intriguing ways. This unpredictability has parallels to the long tradition of artists finding inspiration in chance processes, such as in the aleatoric music of John Cage.42 However, the uncertainty of large language models is not arbitrary randomness, but rather can be seen as unexpected interpolations and combinations within their domain of imitation: explorations of the latent space learned from their training data. Recent work has shown how diffusion models can encode aspects of human musical expectation and surprisal.43 This suggests that model errors and uncertainties have aesthetic potential if creators can scaffold and direct them in a meaningful way. However, they are currently serendipitous side effects of imitative processes, rather than being scaffolded with meaningful interactions. One possible path forward is to leverage abstract reasoning processes, such as those at play in large language models, to systematically recognize and leverage model errors and scaffold how human creators tap into them as resources for discovery and creativity. In doing so, we propose that generative AI provides an opportunity to advance the artistic legacy of revealing creativity hidden within unpredictability.

Transformational Creativity

Boden famously proposed three forms of creativity—combinational, exploratory, and transformational.44 Combinational creativity involves novel syntheses of familiar ideas. Exploratory refers to generating novel ideas within an established conceptual space. Transformational creativity, however, fundamentally reshapes a domain's possibilities. Though this is an ambiguous notion, Boden cites Schoenberg’s ideas about atonality as a musical example.

Transformational creativity is rare and revolutionary—it is the long tail of creative acts. Even so, it is vital for the future of music; this is how we catalyze periodic upheavals of musical thinking and yield influential new movements, while in turn using music as a catalyst to inspire hope and optimism that positive pathways and solutions to any situation—no matter how intractable—can always be found. With generative systems becoming increasingly capable at combinational and exploratory tasks, there are opportunities to also support this most ambitious form of human creativity.45

Presently, it is hard to see a path to models independently achieving this transformational type of creativity. However, a promising way forward is human–AI collaboration. In this context, our machines need not recast music independently but instead amplify human creativity into unfamiliar and radical new domains. We hope for systems that can scale up cycles of co-creation and feedback to accelerate refinement of transformational ideas, as well as aggregate cross-disciplinary knowledge to make unconventional connections across domains. Evaluations of this should prioritize long-term contribution, wherein AI tools enhance imagination to help sustain music's endless evolution.

Thoughtfully integrating AI has potential to accelerate human musical discovery in multiple ways. At times, embracing uncertainty can spark novel ideas within established conceptual spaces, and help us rapidly explore them. Unexpected permutations can reveal overlooked possibilities, encouraging us to take a closer look. Periodically, transformational leaps enable us to explore uncharted territory. All these forms of discovery are vital for music: the former two nourish thriving ecosystems, while the latter propels enduring reinvention and growth. With human creativity amplified but not displaced by machine collaboration, music can evolve without losing touch with human experience. AI can assist discovery, but music's capacity to speak across eras originates in our shared experiences.

6. Developing New Tools for Human Creativity and Discovery

While past creative tools provide useful starting points and evocative models for promoting musical discovery, fully realizing generative AI’s transformative potential also requires new perspectives. Rather than simply replicating long-standing assumptions and interfaces, we must rethink human–machine interaction to prevent established biases from implicitly constraining the potential of future systems. Here, grounded in past and present research in our group, we outline our vision for future generative tools that encourage musical discovery across a set of exciting applications.

Augmented Ideation

Musical ideation is profoundly shaped by context, from lone composition to ensemble improvisation. These environments present distinct opportunities for AI augmentation, while posing challenges requiring thoughtful sensitivity. For example, composers ideate through cycles of exploration and refinement, necessitating adaptive systems that toggle between divergent idea generation and focused iteration. Meanwhile, improvisers often ideate fluidly from real-time stimuli, implying tools for rapid variation and response.

Our group has long cultivated systems to enhance musical ideation, for instance using computational methods to interface with large audio datasets.46 However, modern generative AI technology offers a fundamentally distinct design material for fueling creativity through its capacity to synthesize novel outputs that derive from existing musical datasets, for instance with text-based semantic guidance. Recently, we developed a sound generation method that introduces semantic guidance to the modular synthesizer paradigm, a historic set of tools that has fueled musical ideation for decades. This method allows users to generate sounds from prompts, but then adjust these sounds and freely explore using a small set of interpretable knobs,47 in contrast to black-box sound generation methods.

Designing interactive systems also allows nuanced and reciprocal influence. Our group has built an AI ideation system that taps into individual users’ voices to brainstorm and create compositional material.48 For a recent concert, we developed and deployed a real-time generative AI system based on a set of RAVE49 models. This system translated and varied performer gestures into provocative new timbres, provoking them to form a dialogue with altered versions of their own ideas. This real-time call-and-response resulted in stimulating duets that neither party could have produced alone.50

Augmented Presentation

Historically, music has been commodified and marketed as static, definitive products—fixed recordings and compositions intended for passive consumption. However, our group has previously proposed a more flexible paradigm for musical experiences centered around fluid musical "sound worlds" that users can manipulate and extend indefinitely.51 Artificial.fm is another proof-of-concept system that demonstrates an “AI Radio,” allowing collaborative steering of AI-generated music outputs using participatory curation.52

Generative AI technology could prove integral to realizing this vision of mutable musical ecosystems that break from traditional attachments to predefined songs and recordings. However, this is a nontrivial extension of current paradigms for music generation: composers must retain the ability to endow generative models of their music with certain invariant qualities that establish their aesthetic values. Even so, generative techniques offer promising means to manifest adaptable, personalized sonic experiences that transcend static compositions. Beyond encouraging rediscovery of existing music, we have also held an interest in how adaptive music can be used for affect improvement.53 Broadly, we are interested in harnessing these methods to give people agency in the music that they share and experience, as well as to introduce surprise and delight in hearing well-liked music that reveals new secrets at each listening.

Creative and Adaptive Learning

Influential pedagogical theories like constructionism highlight learning through creating meaningful artifacts, often facilitated by technology.54 Vygotsky's zone of proximal development model55 describes a scaffolded learning process, where guidance nudges students just beyond current competencies. These frameworks underscore the potential of generative AI to contribute to transforming learning beyond passive transmission and toward creative invention. Learners can translate conceptual ideas into personally relevant works to internalize new knowledge.

Prior systems like Hyperscore,56 developed in our lab (shown in Figure 1), exemplify this creative learning by enabling students to draft motifs and develop compositions with coarse-grained sketching behavior and intelligent harmonic controls. It is essential that future tools—such as the extended Hyperscore environment that our group is currently designing for the new Johnson Education Center at the Dallas Symphony Orchestra57—similarly maintain learner agency and engagement to maximize their learning and growth. When preserving this, the immense power of generative techniques to actualize ideas can profoundly enrich learning across skill levels. Students stand to gain deep understanding and identity-forming creative skills as they steer personalized journeys and shape multifarious variations grown from their own seed ideas.

Figure 1

Max Addae of the Media Lab’s Opera of the Future Group mentoring a young composer at a Hyperscore Workshop at the Boston Children’s Museum in May 2023. (Credit: Tod Machover)

Scaling Participation and Collaboration

Beyond empowering individuals, generative AI could also transform music's social fabric by facilitating creativity within and among communities. Recent endeavors like our group's City Symphonies58 invite residents to contribute to musical portraits of urban areas through diverse submissions aggregated into grand-scale experiences. We have developed a range of technologies to support community input into collaborative works,59 but generative techniques present new opportunities for such designs; they could enable community members to contribute and combine a wide variety of expressive ideas, with even greater facility and power than previous tools provided. To nurture communal creativity, systems must maintain individual voices, and enable both personal exploration and constructive dialogue between different contributions. We seek thoughtfully designed technologies that help foster deep belonging and equitable exchange between community members in creative collaboration, and are currently developing such tools for the Wellbeing of the World: A Global Symphony60 project, scheduled to premiere in 2025.

Identifying Limits

While generative AI technology promises rich possibilities for musical discovery, we must also identify boundaries: certain profoundly human qualities and lived experiences of music may remain beyond capture for the foreseeable future. This is, of course, true even more broadly than music: we must probe the conceptual limits of new technologies, meaningfully speculate on their potential consequences, and consider what we need to preserve when bringing automation into human endeavors. Our group explores these tensions through the rich medium of opera, which brings together artistic and technological means to imagine and interrogate such issues.61 Opera can help us tell important, humanistic stories that provoke and ground conversations about future technologies. We explore AI's cultural tensions through operas integrating stories and real systems. Figure 2 shows an example of such a system, which is a gesture-controlled device that navigates a generative AI environment. These operas enact both dreams and limitations in order to crystallize priorities at the heart of our research—catalyzing creative discovery through machines built first and foremost for expanding human potential rather than simply accelerating industrial progress.

Figure 2

Musical exploration system from Tod Machover’s VALIS, with MIT’s Nina Masuelli performing with “The Jar” to navigate a generative AI environment designed by Media Lab PhD student Manaswi Mishra. (Credit: Maria Baranova)

7. Conclusion

The discovery of new musical ideas, for individuals and across communities, has long progressed through an intricate exchange between human creativity and technological innovation. Generative AI technology now stands to carry this legacy forward—but truly nurturing musical creativity relies on developing transformative new systems guided by this synergistic interaction. Our goal in this article has been to propose musical discovery as an orienting principle, outline key musical common sense capabilities—structural, affective, and sociocultural—that are vital to meaningfully enable this, and showcase possible models for the design of new tools for musical discovery. Despite the impressive accomplishments of present-day musical generative models, we believe the path ahead is rich with challenges that will necessitate insightful solutions both technical and artistic, broad collaboration, and lively community dialogue. Our task is now to formalize these challenges and opportunities, propose and manifest solutions, and collectively progress systems while ensuring that they expand, rather than constrain, the horizons of human musical endeavor.

Acknowledgments

We gratefully acknowledge the contributions of Opera of the Future and Hyperinstruments group members, both past and present, to projects discussed in section 6. Their creativity and expertise have been instrumental in shaping the ideas presented in this paper. We also extend thanks to Manuel Cherep and Emil Droga for their participation in early discussions.

Bibliography

Agostinelli, Andrea, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, et al. “MusicLM: Generating Music from Text.” Preprint submitted January 26, 2023. https://doi.org/10.48550/arXiv.2301.11325

“AIVA, the AI Music Generation Assistant.” Accessed December 11, 2023. https://www.aiva.ai/.

Amabile, Teresa M. Creativity in Context: Update to the Social Psychology of Creativity. New York: Routledge, 2019. https://doi.org/10.4324/9780429501234.

Anglada-Tort, Manuel, Peter M. C. Harrison, Harin Lee, and Nori Jacoby. “Large-Scale Iterated Singing Experiments Reveal Oral Transmission Mechanisms Underlying Music Evolution.” Current Biology 33, no. 8. Published March 22, 2023. https://doi.org/10.1016/j.cub.2023.02.070.

Attali, Jacques. Noise. Minneapolis: University of Minnesota Press, 1977. https://www.upress.umn.edu/book-division/books/noise.

Berliner, Paul F. Thinking in Jazz: The Infinite Art of Improvisation. Chicago: University of Chicago Press, 2009.

Bertin-Mahieux, Thierry, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. “The Million Song Dataset,” 591–96. New York: Columbia/Academic Commons, 2011. https://doi.org/10.7916/D8NZ8J07.

Bickmore, Timothy W., and Rosalind W. Picard. “Establishing and Maintaining Long-Term Human-Computer Relationships.” ACM Transactions on Computer-Human Interaction 12, no. 2 (2005): 293–327. https://doi.org/10.1145/1067860.1067867.

Boden, Margaret A. The Creative Mind: Myths & Mechanisms. New York: Basic Books, 1991.

Boehmer, Konrad. “Chance as Ideology.” In John Cage (October Files #12), edited by Julia Robinson, 17–34. Cambridge, MA: MIT Press, 2011. https://mitpressbookstore.mit.edu/book/9780262516303.

Boltz, Marilyn G. “Musical Soundtracks as a Schematic Influence on the Cognitive Processing of Filmed Events.” Music Perception 18, no. 4 (2001): 427–54. https://doi.org/10.1525/mp.2001.18.4.427.

Bolukbasi, Tolga, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T Kalai. “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.” In Advances in Neural Information Processing Systems 29, edited by D. Lee et al., 4349–57. San Diego, CA: Neural Information Processing Systems Foundation, 2016. https://proceedings.neurips.cc/paper_files/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html.

Bommasani, Rishi, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein et al. “On the Opportunities and Risks of Foundation Models.” Preprint submitted July 12, 2022. https://doi.org/10.48550/arXiv.2108.07258.

“Boomy - Make Generative Music with Artificial Intelligence.” Accessed December 11, 2023. https://boomy.com/.

Brain Opera 2.0 (AI Improvisation Clip). Cambridge, MA: MIT Museum, 2022. https://vimeo.com/893055731/a5f6688c58?share=copy.

Brändström, Sture. “Music Teachers’ Everyday Conceptions of Musicality.” Bulletin of the Council for Research in Music Education, no. 141 (1999): 21–25. https://www.jstor.org/stable/40318978.

Browning, Jacob, and LeCun, Yann. “AI and the Limits of Language.” Noema Magazine, August 23, 2022. https://www.noemamag.com/ai-and-the-limits-of-language.

Buolamwini, Joy, and Timnit Gebru. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 77–91. PMLR 81, 2018. https://proceedings.mlr.press/v81/buolamwini18a.html.

Burton, Richard R., and John Seely Brown. “An Investigation of Computer Coaching for Informal Learning Activities.” International Journal of Man-Machine Studies 11, no. 1 (1979): 5–24. https://doi.org/10.1016/S0020-7373(79)80003-6.

Caillon, Antoine, and Philippe Esling. “RAVE: A Variational Autoencoder for Fast and High-Quality Neural Audio Synthesis.” Preprint submitted December 15, 2021. https://doi.org/10.48550/arXiv.2111.05011.

Caplin, William E. Classical Form: A Theory of Formal Functions for the Instrumental Music of Haydn, Mozart, and Beethoven. New York: Oxford University Press, 1998.

Cherep, Manuel, and Nikhil Singh. “SynthAX: A Fast Modular Synthesizer in JAX.” New York: Audio Engineering Society, 2023. https://www.aes.org/e-lib/inst/browse.cfm?elib=22261.

Choi, Yejin. “The Curious Case of Commonsense Intelligence.” Daedalus 151, no. 2 (2022): 139–55. https://doi.org/10.1162/daed_a_01906.

Cope, David. “Computer Modeling of Musical Intelligence in EMI.” Computer Music Journal 16, no. 2 (1992): 69–83. https://doi.org/10.2307/3680717.

Copet, Jade, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, and Alexandre Défossez. “Simple and Controllable Music Generation.” Preprint submitted November 7, 2023. https://doi.org/10.48550/arXiv.2306.05284.

Csikszentmihalyi, Mihaly. Creativity:  Flow and the Psychology of Discovery and Invention. New York: HarperCollins, 1997.

———. Flow: The Psychology of Happiness. New York: Random House, 2013. https://scholar.google.com/scholar?cluster=15319637431267571346&hl=en&oi=scholarr.

“Custom Instructions for ChatGPT.” OpenAI (blog). Accessed December 10, 2023. https://openai.com/blog/custom-instructions-for-chatgpt.

Davis, Ernest, and Gary Marcus. “Commonsense Reasoning and Commonsense Knowledge in Artificial Intelligence.” Communications of the ACM 58, no. 9 (2015): 92–103. https://doi.org/10.1145/2701413.

Deahl, Dani. “This Live Stream Plays Endless Death Metal Produced by an AI.” Verge, April 27, 2019. https://www.theverge.com/2019/4/27/18518170/algorithm-ai-death-metal-dadabots-live-stream-youtube-cj-carr-zack-zukowski.

Defferrard, Michaël, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson, eds. “FMA: A Dataset for Music Analysis,” 2017. https://doi.org/10.48550/arXiv.1612.01840.

Dhariwal, Prafulla, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. “Jukebox: A Generative Model for Music.” Preprint submitted April 30, 2020. https://doi.org/10.48550/arXiv.2005.00341.

Elf Tech. “Elf Tech - Grimes AI.” Accessed December 11, 2023. https://elf.tech/connect.

Elizalde, Benjamin, Soham Deshmukh, Mahmoud Al Ismail, and Huaming Wang. “CLAP Learning Audio Concepts from Natural Language Supervision.” In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, June 4–10, 2023, 1–5. New York: IEEE, 2023. https://doi.org/10.1109/ICASSP49357.2023.10095889.

Endel. “Endel - Personalized Soundscapes to Help You Focus, Relax, and Sleep. Backed by Neuroscience.” Accessed December 11, 2023. https://endel.io/.

Engel, Jesse, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, and Adam Roberts. “DDSP: Differentiable Digital Signal Processing,” OpenReview, 2019. https://openreview.net/forum?id=B1x1ma4tDr&utm_campaign=The%20Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz--lHa3pf39HKdMz0A1tTyMs7yaUvR-4GdjgDiAL5vAuesw8a29EA4B42QbBx7itSHJXe6SOa79DNwCN3pwyMQPMLxv9fA.

Farbood, M. M., E. Pasztor, and K. Jennings. “Hyperscore: A Graphical Sketchpad for Novice Composers.” IEEE Computer Graphics and Applications 24, no. 1 (2004): 50–54. https://doi.org/10.1109/MCG.2004.1255809.

Fonseca, Eduardo, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. “Freesound Datasets: A Platform for the Creation of Open Audio Datasets,” 486–93. Presented at the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017. https://archives.ismir.net/ismir2017/paper/000161.pdf.

Gemmeke, Jort F., Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 776–780. New York: IEEE, 2017. https://doi.org/10.1109/ICASSP.2017.7952261.

Glassman, Elena L. “Designing Interfaces for Human-Computer Communication: An On-Going Collection of Considerations.” Preprint submitted September 5, 2023. http://arxiv.org/abs/2309.02257.

Gómez-Cañón, Juan Sebastián, Estefanía Cano, Tuomas Eerola, Perfecto Herrera, Xiao Hu, Yi-Hsuan Yang, and Emilia Gómez. “Music Emotion Recognition: Toward New, Robust Standards in Personalized and Context-Sensitive Applications.” IEEE Signal Processing Magazine 38, no. 6 (2021): 106–14. https://doi.org/10.1109/MSP.2021.3106232.

Gordon, Skylar, Robert Mahari, Manaswi Mishra, and Ziv Epstein. “Co-Creation and Ownership for AI Radio.” In Proceedings of the 13th International Conference on Computational Creativity (ICCC). Bozen-Bolzano: Italy, 2022. https://computationalcreativity.net/iccc22/wp-content/uploads/2022/06/ICCC-2022_21S_Gordon-et-al..pdf

Hiller, Jr., and L. M. Isaacson. “Musical Composition with a High Speed Digital Computer.” (Paper no. 29.) New York: Audio Engineering Society, 1957. https://www.aes.org/e-lib/browse.cfm?elib=189.

Holbrow, Charles Joseph. “Fluid Music: A New Model for Radically Collaborative Music Production.” (PhD diss., Massachusetts Institute of Technology, 2021). https://web.media.mit.edu/~holbrow/project/fluid-music/Fluid-Music-Charles-Holbrow-PhD-Dissertation.pdf.

Honing, Henkjan, Carel ten Cate, Isabelle Peretz, and Sandra E. Trehub. “Without It No Music: Cognition, Biology and Evolution of Musicality.” Philosophical Transactions of the Royal Society B: Biological Sciences 370, no. 1664 (2015): 20140088. https://doi.org/10.1098/rstb.2014.0088.

Huang, Cheng-Zhi Anna, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai et al. “Music Transformer: Generating Music with Long-Term Structure,” OpenReview, 2018. https://openreview.net/forum?id=rJe4ShAcF7.

Huq, Arefin, Juan Pablo Bello, and Robert Rowe. “Automated Music Emotion Recognition: A Systematic Evaluation.” Journal of New Music Research 39, no. 3 (2010): 227–44. https://doi.org/10.1080/09298215.2010.513733.

INFINITE ALBUM. “Infinite Album: Infinitely Generative AI Music for Gamers.” Accessed December 11, 2023. https://www.infinitealbum.io.

Jessop, Elena, Peter A. Torpey, and Benjamin Bloomberg. “Music and Technology in Death and the Powers,” Proceedings of the International Conference on New Interfaces for Musical Expression, 349–54. Zenodo, June 1, 2011. https://zenodo.org/records/1178051.

Jones, Quincy. 12 Notes: On Life and Creativity. New York: Harry N. Abrams, 2022.

Juslin, Patrik N., and John A. Sloboda, eds. Handbook of Music and Emotion: Theory, Research, Applications. New York: Oxford University Press, 2010. https://doi.org/10.1093/acprof:oso/9780199230143.001.0001.

Keller, Peter. “Ensemble Performance: Interpersonal Alignment of Musical Expression.” In Expressiveness in Music Performance: Empirical Approaches across Styles and Cultures, 260–82. New York: Oxford University Press, 2014. https://doi.org/10.1093/acprof:oso/9780199659647.003.0015.

Keller, Peter E. “Joint Action in Music Performance.” In Enacting Intersubjectivity: A Cognitive and Social Perspective on the Study of Interactions, edited by F. Morganti, A. Carassa, and G. Riva, 205–21. Amsterdam, Netherlands: IOS Press, 2008. https://www.researchgate.net/publication/38137462_Joint_action_in_music_performance

LaBelle, Brandon. Sonic Agency: Sound and Emergent Forms of Resistance, 2020. https://mitpress.mit.edu/9781912685950/sonic-agency/.

Lake, Brenden M., Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. “Building Machines That Learn and Think like People.” Behavioral and Brain Sciences 40 (January 2017): e253. https://doi.org/10.1017/S0140525X16001837.

Lecamwasam, Kimaya, Samantha Gutierrez Arango, Nikhil Singh, Neska Elhaouij, Max Addae, and Rosalind Picard. “Investigating the Physiological and Psychological Effect of an Interactive Musical Interface for Stress and Anxiety Reduction.” In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 1–9. CHI EA ’23. New York: Association for Computing Machinery, 2023. https://doi.org/10.1145/3544549.3585778.

Machover, Tod. “City Symphonies.” MIT Media Lab. Accessed December 7, 2023. https://opera.media.mit.edu/projects/city_symphonies/.

———. “Dallas Symphony Orchestra Launches New Jeanne R. Johnson Education Center.” MIT Media Lab. Accessed December 10, 2023. https://www.media.mit.edu/posts/dallas-symphony-orchestra-launches-new-jeanne-r-johnson-education-center/.

———. “Death and the Powers.” MIT Media Lab. Accessed December 7, 2023. https://opera.media.mit.edu/projects/deathandthepowers/.

———. “Repertoire Remix Live Demonstration.” Opera of the Future (blog), August 9, 2013. https://operaofthefuture.com/2013/08/09/repertoire-remix-video/.

Machover, Tod, and Charles Holbrow. “Toward New Musics: What the Future Holds for Sound Creativity.” NPR, July 26, 2019, sec. Editors’ Picks. https://www.npr.org/2019/07/26/745315045/towards-new-musics-what-the-future-holds-for-sound-creativity.

Masclef, Ninon Lizé, and T. Anderson Keller. “Deep Generative Models of Music Expectation.”. Preprint submitted October 5, 2023. http://arxiv.org/abs/2310.03500.

Mehri, Soroush, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model,” OpenReview. 2017. https://openreview.net/forum?id=SkxKPDv5xl.

Minsky, Marvin. “Music, Mind, and Meaning.” Computer Music Journal 5, no. 3 (1981): 28–44. https://doi.org/10.2307/3679983.

Mishra, Manaswi. “Living, Singing AI: An Evolving, Intelligent, Scalable, Bespoke Composition System.” Master’s thesis, Massachusetts Institute of Technology, 2021. https://web.media.mit.edu/~manaswim/Thesis_Media/Thesis/manaswi-MAS-2021-Thesis.pdf

MIT News. “Re-Imagining the Opera of the Future,” September 27, 2023. https://news.mit.edu/2023/re-imagining-opera-of-the-future-valis-0927.

Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. “WaveNet: A Generative Model for Raw Audio.” Preprint submitted September 19, 2016. https://doi.org/10.48550/arXiv.1609.03499.

Pachet, François, Pierre Roy, and Benoit Carré. “Assisted Music Creation with Flow Machines: Towards New Categories of New.” In Handbook of Artificial Intelligence for Music: Foundations, Advanced Approaches, and Developments for Creativity, edited by Eduardo Reck Miranda, 485–520. Cham: Springer International Publishing, 2021. https://doi.org/10.1007/978-3-030-72116-9_18.

Papert, Seymour A. Mindstorms: Children, Computers, and Powerful Ideas. New York: Basic Books, 2020.

Picard, Rosalind W. Affective Computing. Cambridge, MA: MIT Press, 2000. https://doi.org/10.7551/mitpress/1140.001.0001.

———. “Computer Learning of Subjectivity.” ACM Computing Surveys 27, no. 4 (1995): 621–23. https://doi.org/10.1145/234782.234805.

Riedl, Mark O. “Computational Narrative Intelligence: A Human-Centered Goal for Artificial Intelligence.” Preprint submitted February 20, 2016. http://arxiv.org/abs/1602.06484.

Rohrmeier, Martin. “On Creativity, Music’s AI Completeness, and Four Challenges for Artificial Musical Creativity.” 5, no. 1 (2022): 50–66. https://doi.org/10.5334/tismir.104.

Rohrmeier, Martin, and Patrick Rebuschat. “Implicit Learning and Acquisition of Music.” Topics in Cognitive Science 4, no. 4 (2012): 525–53. https://doi.org/10.1111/j.1756-8765.2012.01223.x.

Rosen, Charles. Sonata Forms. New York: W. W. Norton, 1988. https://www.penguinbookshop.com/book/9780393302196.

Savage, Patrick E. “Cultural Evolution of Music.” Palgrave Communications 5, no. 1 (2019): 1–12. https://doi.org/10.1057/s41599-019-0221-1.

Schoenberg, Arnold. Structural Functions of Harmony. Rev. ed. New York: W. W. Norton & Company, 1969.

Singh, Nikhil. “The Sound Sketchpad: Expressively Combining Large and Diverse Audio Collections.” In Proceedings of the IUI ’21: 26th International Conference on Intelligent User Interfaces, 297–301. New York: Association for Computing Machinery, 2021. https://doi.org/10.1145/3397481.3450688.

Singh, Nikhil, Manuel Cherep, and Jessica Shand. “Creative Text-to-Audio Generation via Synthesizer Programming.” Paper presented at the Machine Learning for Audio Workshop at the 36th Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, December 10–16, 2023. https://mlforaudioworkshop.com/CreativeTextToAudio.pdf.

Sounds.Studio. “Sounds.Studio: A Modern Music Production Platform, Powered by Machine Learning.” Accessed December 11, 2023. https://sounds.studio.

Su, David, Rosalind W. Picard, and Yan Liu. “AMAI: Adaptive Music for Affect Improvement.” In Proceedings of the 44th International Computer Music Conference (ICMC), Daegu, Korea, August 3–10 2018. https://usdivad.com/amai/.

Surís, Dídac, Carl Vondrick, Bryan Russell, and Justin Salamon. “It’s Time for Artistic Correspondence in Music and Video.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10564–74. Open Access, Computer Vision Foundation, 2022. https://openaccess.thecvf.com/content/CVPR2022/html/Suris_Its_Time_for_Artistic_Correspondence_in_Music_and_Video_CVPR_2022_paper.html.

Suzuki, Shunryū. Zen Mind, Beginner’s Mind. New York: Weatherhill, 1972.

Tan, Siu-Lan, Matthew P. Spackman, and Matthew A. Bezdek. “Viewers’ Interpretations of Film Characters’ Emotions: Effects of Presenting Film Music Before or After a Character Is Shown.” Music Perception 25, no. 2 (2007): 135–52. https://doi.org/10.1525/mp.2007.25.2.135.

The Wellbeing Project. “Wellbeing of the World: A Global Symphony,” July 26, 2023. https://wellbeing-project.org/wellbeing-of-the-world/.

Trehub, Sandra E. “The Developmental Origins of Musicality.” Nature Neuroscience 6, no. 7 (2003): 669–73. https://doi.org/10.1038/nn1084.

Troyer, Akito van. “Constellation: A Tool for Creative Dialog between Audience and Composer.” In Proceedings of the 10th International Symposium on Computer Music Multidisciplinary Research (CMMR), Sound, Music, and Motion, Marseille, France, October 15-18, 2013.

10th International Symposium on Computer Music Multidisciplinary Research, 2013. https://vantroyer.com/lib/doc/Constellation/Constellation.pdf.

———. “Repertoire Remix in the Context of Festival City.” In Ubiquitous Music, edited by Damián Keller, Victor Lazzarini, and Marcelo S. Pimenta, 51–63. Computational Music Science. Cham: Springer, 2014. https://doi.org/10.1007/978-3-319-11152-0_3.

Vitouch, Oliver. “When Your Ear Sets the Stage: Musical Context Effects in Film Perception.” Psychology of Music 29, no. 1 (2001): 70–83. https://doi.org/10.1177/0305735601291005.

Vygotsky, Lev. “Interaction between Learning and Development.” In Mind and Society, edited by Vera Jolm-Steiner, Michael Cole, Ellen Souberman and Sylvia Scribner, 79–91. Cambridge, MA: Harvard University Press, 1978.

Wiener, Anna. “Holly Herndon’s Infinite Art.” New Yorker, November 13, 2023. https://www.newyorker.com/magazine/2023/11/20/holly-herndons-infinite-art.

Wu, Shih-Lun, Chris Donahue, Shinji Watanabe, and Nicholas J. Bryan. “Music ControlNet: Multiple Time-Varying Controls for Music Generation.” Preprint submitted November 12, 2023. https://doi.org/10.48550/arXiv.2311.07069.

Wu, Yusong, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. "Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10095969.

Comments
0
comment
No comments here
Why not start the discussion?