Skip to main content
SearchLoginLogin or Signup

Visual Artists, Technological Shock, and Generative AI

Generative AI programs have had a great impact on the visual arts. Examining earlier moments of technological shock with once radical media such as photography, the authors suggest that training sets and models must become transparent and inclusive to become a public good.

Published onAug 28, 2024
Visual Artists, Technological Shock, and Generative AI
·

Abstract

The impact of generative AI (GenAI) programs on visual art is comparable to earlier historical moments of technological shock, when literary and visual artists grappled with unprecedented reproductive tools such as the printing press, photography, and cinema. Metabolizing the shock of those once radical inventions eventually yielded great bursts of artistic innovation. Yet, unlike those prior revolutions, the current one presents a deeper threat to artistic innovation by smoothing its source material into endless variants of seamless pastiche. By definition, the corpus of imagery currently being scraped for training already exists—it is overwhelmingly photographic, representational, and Western-hemispheric. As a result, algorithmic aesthetics visually echo the hundred-year-old art movement of Surrealism at its most banal. GenAI thus jeopardizes a singular function of visual artists in contemporary culture: to continuously innovate never-before-seen forms, artistic movements, styles, cognitive concepts, and theories of representation. Moreover, GenAI is a cultural technology. Since generative programs make secondary and tertiary materials by inputting their own outputs, they both intensify the bias found in the corpus and bury ever deeper the historical sources of that bias, neglecting significant future markets and constituencies who could be welcomed in to build richer archives with better metadata. We argue that more inclusive and transparent training sets, permeable models, and significant investment in what we call “public intelligence” can better shape the potential of GenAI tools, confronting technological shock in ways more likely to encourage rather than dampen artistic innovation for the public good.

Keywords: visual art; generative art; combinatorial art; technoshock; bias; archive; public good

1. Introduction

Figure 1

Mechanical Letter Magic.

Illustration of “the Engine,” from the first edition of Gulliver’s Travels (full title Travels into Several Remote Nations of The World) by “Lemuel Gulliver” / Jonathan Swift, printed by Benjamin Motte in 1726, p. 74. Photograph taken by the authors from the firsts edition in Houghton Library, Harvard University. [[full credit tk/request submitted 08-5-24]]

Humans are experiencing an acute technological shock since the eruption of new image technologies using generative artificial intelligence, or GenAI.1 While contemporary entrepreneurs may read this moment of technoshock as disruption of business as usual, and investors might try to anticipate the unpredictable path of such disruption through “technology forecasting” (e.g., horizon scanning, scenario planning, and quantitative forecasting), this paper sets aside entrepreneurial and economic perspectives to probe the current cultural turmoil around GenAI.

AI is a cultural technology. It deals with human languages, human images, and the priors of human perception. Thus, artists and art historians (as well as literary theorists, poets, and writers) bring a unique perspective to this technology.2 Visual culture—the sum of human-made images and architectural settings that we experience every day—shapes (and in some cases forces) understandings of the world. This can happen subconsciously in reception, building on biases that themselves can be intentional or unconscious. Generated images have entered visual culture with a bang, unsettling deeply held beliefs about the value of human creativity. Such shocks can be salutary, offering society the opportunity to make intentional cultural shifts in the face of technological challenges we should not ignore. Confronting these, our team of humanists (led by two historians and a visual artist), ask: Based on historical examples of how artists metabolized challenging technologies in the past, how will the visual arts integrate and cope with this current moment of technoshock? And how can a critical examination of contemporary artistic, legal, and policy strategies in this sector address bias and the banal, low-quality outputs of these programs to shape image-generating tools for the broadest public good?

2. Technoshock!

Technoshock epochs are highly salient inflection points in histories of art, literature, and music. Over the centuries, artistic discourses range from techno-utopianism to techno-pessimism, echoed today by the popular phrase coupling the “promise and peril” of GenAI. But such views are not exclusive to our present moment. The genealogy of probability propositions fueling visual AI programs can be traced back millennia, to early divination manuals and philosophical experiments that alternately advanced and critiqued such ‘combinatorial arts’ in a quest for higher truth, beauty, predictive power, or knowledge.3 Particularly salient were medieval texts exploring the Arabic algorithmic process of Zairja (‘letter magic’), informing Ramon Llull’s fourteenth-century Ars Magna (‘Ultimate General Art’) and Gottfried Leibniz’s seventeenth-century Dissertatio de arte combinatoria (‘Dissertation on Combinatorial Art’). British writer and satirist Jonathan Swift was aware of these precedents, and his 1726 work of fiction, Gulliver’s Travels, creates an encounter with a knowledge ‘engine’ that combines sets of symbols, phrases, and numerals—conceptual precursors to today’s image training sets.

Swift’s engine Figure 1 is taken up by computer scientists as a foundational instance of a combinatorial ‘creative machine.’ Unremarked in that genealogy is the status of Swift’s narrative as satire; the fictional engine is definitely unwanted, exaggerating what was upending the lives of authors such as himself: the technoshock of mass production via the printing press. In the story of the “Engine,” 36 “pupils” would turn the numerous cranks around the machine to generate supposedly novel combinations, then dictate these phrases to four scribes. The illustrator of the first edition of Gulliver’s Travels gave the engine the raw materials of arabesque and floriated symbols, harkening back to those earlier combinatorial forms: letter magic, ancient stenography, and perhaps even Kabbalah. The engine is an entirely dismal machine, producing an infinite quantity of literature that no one would read -- because the apparatus could only recombine existing formulae, frozen chunks of human imagination.4

Echoing current debates around GenAI and authorship, Swift’s narrative addressed the rampant production of plagiarized and altered texts issuing from the monopoly that the British crown had granted to the medieval Stationers’ Guild.5 The Guild censored, copied, and generated texts without permission or quality control from human writers. Swift and, a century before him, John Milton argued that such centralized control hampered the possibilities of new and unfettered speech.6 Swift’s engine thus responded to technological shock with sharp satire, attempting to shape a better future for moveable type and the printing apparatus, regulated following his and Milton’s interventions towards concepts of copyright that could reward creative writers without limiting speech or fair use.7

Today, GenAI raises similar questions in the field of visual arts (as well as literary and musical arts not addressed here).8 In our research, responses to GenAI have varied by generation and privilege, ranging from the apocalyptic to the sublime (such ambivalence is typical of technological shock more generally). Historically, technoshock only becomes culturally productive through phases of encounter, critical pushback, adaptation, and modification. Early adopters thus play a crucial role in the cultural metabolization of technologies, as they point to changes or new practices necessary for new tools to better contribute to society. Such adopters often expressly aim to revolutionize ways of seeing and thinking through new technology, generating, consuming, or circulating new media in order to produce receptive publics.

Figure 2

A new medium for democracy.

Abolitionist Frederick Douglass was among the earliest American adopters of photography in its first patented form, the daguerreotype, a process in which unique images are developed directly on the metal plate and are best seen at an angle in raking light. Here is a documented instance from 1848, given by Douglass to social reformer Susan B. Anthony shortly prior to the Seneca Falls conference (which he attended). Albert Cook Myers Collection, Chester County Historical Society, Pennsylvania. Image courtesy of the University of Rochester, https://www.rochester.edu/newscenter/early-douglass-daguerreotype-on-display-131312/.

2.2. Photography, Access to Portraiture, and Social Change

The development of photography ushered in a period of technoshock that would span more than a century. Its emergence in the mid-nineteenth century raised broad questions: Was it an art, a science, a spirit medium? Was it good for industrial reproduction, or a new tool of surveillance and policing? It was such a strange medium that no one could precisely place it in culture or definitively ascertain its future. However, early adopters secured at least one future for the technology: photography would function above all as a new form of portraiture that was accessible to all. This ‘mass medium’ was decried by contemporaneous art critics in the mid-nineteenth century, such as French poet Charles Baudelaire, who termed it “art’s mortal enemy”—a “useless and tedious medium” that reproduced reality in banal terms.9 It is unclear if Baudelaire would have changed his mind had he lived to see how photography evolved as an art form, within twentieth century modes of “mechanical reproduction.”10

At photography’s inception, Frederick Douglass—American orator, abolitionist, and statesman—paid little heed to French aesthetics as he crafted his own theory of “Pictures and Progress” in the new medium. The most photographed figure of the nineteenth century (see Figure 2), he put into practice how the new technology could make a visual culture for all, arguing: “The humbled servant girl whose income is but a few shillings per week may now possess a more perfect likeness of herself than noble ladies and court royalty.”11 Access to photography for Douglass meant claiming and democratizing the artistic genre of portraiture, which could reveal both new forms of political celebrity and true images of Black Americans that countered the violence of racist stereotypes and cartoons. Douglass’s use of photography is a well-documented instance of the power that can be gained by new media adopters shaping the future of a new cultural technology.

Photography itself was of course agnostic, offering both ‘truth’ and ‘magic’ (e.g., spirit photographs aiming to show nonvisible emanations from beyond the grave).12 As the twentieth century dawned, artists themselves became ‘disruptors’ of photography’s truth-claims. Artists took up the tool in an age of abstraction, achieving yet another response to the decades of photographic technoshock. The powerful transitivity of the twentieth-century photograph as itself photographable and reproducible was revealed through replication, ‘cutting,’ and recombination in collage, as Dada artists radically rearranged print photographs to conjure hypothetical futures or provide acid commentary on the present (see Figure 3).

Figure 3

Dada artists respond.

In an art movement claiming to offer childlike nonsense as truth, Dadaists responded to the flood of photographic images in the early twentieth century with collage and photomontage: detail from Hannah Höch, Cut with the Kitchen Knife through the Last Epoch of Weimar Beer-Belly Culture in Germany (English translation; original title Schnitt mit dem Küchenmesser durch die letzte Weimarer Bierbauchkulturepoche Deutschlands), 1919. Collection Nationalgalerie, Staatliche Museen zu Berlin, Germany. © 2006 Bildarchiv Preussischer Kulturbesitz, Berlin, © 2006 Hannah Höch / Artists Rights Society (ARS), New York / VG Bild-Kunst, Bonn, photo: Jörg P. Anders, Berlin.

Photography had already yielded an even newer medium of cinema, subjected to more cutting, with rearrangements now called “montage.”13 The waves of early adoption and transformation in photography and film both characterize and produce our present moment, as the visual training sets of GenAI are constituted primarily from digitized photographs and videos, whether historic or born digital. In theory, as in earlier historical moments, art made with the new technologies of GenAI can help the public understand how generative machine learning works in relation to their own inputs and data, and, again in theory, GenAI art could help us see how ‘reality’ is dominated by conventional views that need to be “cut with a kitchen knife” and disrupted.14 But in these early days, such understandings have been hampered by the way these tools have been developed, how they are organized, and how their outputs are produced. And by virtue of their very ‘seamlessness’ and ‘smoothing,’ the programs yielding GenAI imagery have so far reinforced banal image conventions and resisted critical artistic intervention.

Technological shock can be a stimulus for desired cultural transformations. What are the glimmerings of that process today, and how can policy help? We examine two broad domains and challenges of GenAI’s promise and peril in the next two sections: How are contemporary artists using, adapting, or confronting GenAI tools in their creative practice? How might artists and designers address known issues of bias? We argue that Gen AI must be shaped to yield a greater public good. It is time to examine more of the costs (including unacknowledged drain on the electrical grid) that undergird the benefits of this technology. It is also time to curb unsustainable and undesirable patterns before they become embedded and entrenched. What we recommend in the final section of this paper are policies to keep GenAI from the worst of its perils (dampened creativity, bias, copyright infringement, uncompensated labor, banal cultural results, and unsustainable computation loads), while increasing the likelihood that innovative art will emerge to stimulate reflection, produce desired cultural outcomes, and expand the capacities of these tools.

3. Peril, Promise, and the Public Good

Figure 4

Pastiche aesthetic of GenAI.

DALL-E generated image used for publicity for “ChatMIT,” a May 2, 2024 conversation (live webinar) between MIT President Sally Kornbluth and Sam Altman, founder of Open AI. Prompt: “robot in conversation with a beaver with tail wearing a white t-shirt by a fireplace at night, indoors, both sitting in chairs. Impressionism.” The beaver is MIT’s mascot; we surmise that the robot was entered into the prompt to cast the role of Altman in this particular production. https://web.mit.edu/webcast/engineering/s24/openai/0524/03/

3.1. Innovation: new visual vocabularies and broader aesthetic possibilities

The illustration in Figure 4, crafted for MIT’s School of Engineering, humorously captures fears of ‘human replacement’ via a GenAI prompt to pair a robot (superior to the human?) and an animal (inferior to our tool-using species?) in the artistic style of Impressionism.15 Cultural responses to earlier waves of automation include auto worker and newspaper editor Charles Denby, who declared in 1960, “We don’t use the machine; the machine uses us.”16 Yet, through decades of contestation, debate, and adaptation, twentieth-century workers have been trained to work in concert with assembly-line robots. GenAI appears to pose a different, more existential challenge because it threatens not only the manual, but also the mental work that humans do through the charged title of “intelligence” itself.17 Putting aside the question of whether mens et manus can be so easily segregated, it is noteworthy that tech workers are among the industry critics warning of a heightened risk of societal takeover by machines who can do “conceptual work” and (hyperbolically) will cause human extinction.18 While this paper does not directly engage with these calls to “stop AI” altogether, it does focus on a narrower task—assessing GenAI impacts on humans’ discerning gifts for visual thinking. These range from pattern recognition to judgments of color balance, design intelligence, composition, inventive motifs, uniquely spatial forms of memory, inference correction, and, above all, innovation in concepts that yield the dynamism we want for the visual arts.19

Innovation describes leaps in human culture as well as technology. Some art historians argue that the commitment to abstraction in the twentieth century, across many artistic fields, was an epistemic revolution accelerated by recording technologies of all kinds. Notably, photography unleashed painters from the burden of representation, even as wax inscriptions and playback of sound produced entirely new possibilities for music and a materialized acoustic past. Historians of technology generally recognize that the intertwining of preconditions and tipping points are characteristic of innovations. The significance of these histories in our account comes when artists are challenged to rededicate themselves to what art is good for in their societies. The turbulent technoshock of our own moment, introduced by waves of AI tools, is a challenge worth meeting with cultural innovation. It falls to all of us to shape the future for GenAI in a way that allows ‘tinkering’ by artists, fostering conceptual innovations and new understandings by critics and art historians who might be encouraged to become adept at the intricacies of machine learning.

Regarding the technoshock of photography discussed above, the early analog medium is often cited by computer scientists as a precedent for the hoped-for uptake of the computed images generated by machine learning. We point out that photography is itself foundational to machine vision protocols via the crucial, decades-in-the making engineering of digital photography and the even earlier digital scanning capacities designed for “operational images.”20 By some accounts, the 1958 “perceptron” (machine ‘perception’ protocols coded into punch-cards) yielded forms of nonsemantic image shuffling that are still revolutionizing contemporary machine-vision and machine-learning image “recognition” programs.21 All these innovations take human vision into account and lean on the learned conventions of photography for everything from aspect ratio to edges, contrasts, shadows, and highlights. Purely computational processes opened photography (as it continues to be called) to pixel-by-pixel editing and fabulation—yet another phase of technoshock that demanded setting new standards and ethics for scientific, journalistic, and legal forms of illustration, witnessing, and evidence. The internet and World Wide Web magnified all these impacts, with cascading waves of technoshock pouring from the putty-colored hub of a desktop computer.22 Binary, nonvisual protocols of scanning and mathematical siting in a space of computation are thus operations alien to human visual cognition yet configured for our perceptual comprehension—ever-richer fields of pixels engineered to appear seamlessly ‘realistic’ (i.e., photographic) to the human eye.

How can we shape GenAI to encourage desired cultural innovation? We seek as educators to help develop “creative, curious, caring, collaborative human beings,” and we value art’s broad role in encouraging such ways of being in society as a whole.23 Are GenAI programs opening up computation to public understanding or further concealing it from innovative entry? From the experience of generative programs thus far, what is immediately obvious to educators (for example) is the dampening of students’ creativity in problem-solving, humanistic analysis, critical thinking, and visual inventiveness. This has raised an urgent question of pedagogy: Are students who use a tool such as ChatGPT or MidJourney off-loading calculation, writing, and visual imagination to rehash pre-existing prose, equation solving, or conventional imagery? Is AI merely a tool (like a calculator or typewriter) freeing the mind to be creative or a device that delays learning how to think, postpones finding your voice (in writing), and quelches individual style (in art)? This first acknowledged peril comes down to the risk of dulling human visual inventiveness, averaging production for populations of learners who might otherwise discover their creativity and contribute to new paths for the visual arts.24

From an aesthetic point of view, there is a related problem evident in the current state of imagery generated by the probability propositions of GenAI programs. Derivative, repetitive, and weak, the generated images can be characterized as “pastiche,” in which multiple styles from different epochs, regions, or authors are blended.25 This is a lowest-common-denominator approach that avoids offense yet achieves only the most banal effects. For example, GenAI uses stochastic gaming interfaces to determine what could be meant by the word “Impressionist” in the prompt in Figure 4. Yet nothing remains of the revolutionary brushstrokes, accumulation of thick paint, phenomenology, and perceptually-tuned color innovations that characterized Impressionism in its time, becoming a driver for aesthetic theories of modernism. As supporters of diverse art worlds, we have issues with GenAI’s tendency to flatten the digital corpus from which it draws through the probabilistic structure of its underlying algorithms.

We do see the power in the combinatorial novelty of the visual forms of GenAI, but more in the way that they reveal the conventional and superficial quality of most human imagery rather than any ability to contribute innovatively to art. Parsing the reason for such consistent banality in generated images across multiple platforms may be useful, explaining why most GenAI images look like “bad Surrealism” from the 1930s.26 Surrealism had two historical forms: one relying on conventions of reproductive representation (judged by art historians as no longer particularly innovative), and the other creatively building on Cubism’s abstracted, less-representational pictorial surface. The first mode became so popular that it was incorporated into funhouses at Coney Island, appeared in mainstream advertisements, populated abstract animated film with brooms and buckets (in Disney Studio’s Fantasia), and became the visual vocabulary for Hollywood dream sequences. The second, however, kept the abstract  commitments of Cubism and flowed into an artistic avant-garde that by some accounts propelled the true cognitive revolutions of the twentieth century.27

If GenAI is yielding images that resemble mostly the first mode, then we have reason to be concerned. Compare the resulting pastiche with the idea of edginess that true innovation brings. If we desire art forms that can open up new cognitive domains through shifts in visual practice, we should use the technoshock around GenAI (and disappointments in its banal results) to push for changes in the platforms to encourage human intervention and invention. The troika of training sets, models, and interfaces must be opened to public use and radical research, whereby diverse communities have access and can make contributions to an expanding public good.

We acknowledge as historians that this aesthetic recommendation aligns with the modernist modes of ‘assemblage,’ ‘collage,’ and ‘montage’ over pastiche.28 The former approaches navigate shock while preserving its edges. The different forms interact but retain elements of their distinct origins, prompting reflective thinking about their novel combination. What is needed, then, is the computational equivalent of a ‘scissors and glue’ editing toolkit (as in the contribution of CRISPR-Cas9 to gene editing) without removing the possibility for smoothing that characterizes pastiche. The two modes of pastiche and edge-revealing assemblage are not mutually exclusive, and the ability to work freely with both often characterizes innovation in multiple artists’ careers.

Opening the toolkit will also help rebalance the inferences used in GenAI, often toxic with accumulated bias. Recommended approaches should foster artistic innovation and cultural correction by allowing artists and communities to enter GenAI’s ‘back end.’29 This means being able to see, transparently, the sources of visual training and pretraining datasets. It means allowing shared access to tweak the code. It means allowing contributions of metadata from unique, community-based sources of corrective knowledge. All such recommendations aim for more experimental art forms that will ignite human creativity.

While Dadaists and Surrealists distilled critical and curative visions emerging from the trauma of the first World War in their manipulations of mass media, GenAI sifts images more haplessly. Utilizing visual material that is always already digital, machine learning is dominated by photographic conventions built into its protocols (e.g., single-point perspective, coherent shadows and highlights, conventional depth relations, hyperarticulated edges, rectangular landscape or portrait formats). Designed for accuracy, the generated results necessarily echo earlier analog models. They look photographic even when intended to be abstract.30 While these datasets and machine learning protocols have been produced by a great number of people, we believe that further layers of public intelligence and humanistic information must be brought in to refine these models beyond the limited purview of ‘fine-tuning.’ This has already been the experience of seasoned artistic practitioners adept in generative tools: they report extensive back-and-forth iteration between the creative human and the machine via tailoring training sets, tuning models, and prompting.31 To work successfully with GenAI, artists report having to “dance” with the algorithm or “plant seeds” via prompts.32 Without engaging the machine-as-prosthesis in such active ways, artists report disappointing results: retreads of older art movements (e.g., century-old Surrealism) or conventional visual forms.

While capable of being scaled up and digitally printed or scanned into 3D software and industrially milled, most GenAI artworks so far remain 2D images on screens, isolated from thriving artistic practices such as social practice art; performance art; experimentations with organic materials, as in bio-art; acoustic interventions, such as sound art; spatial interrogations, such as installation art; and/or site-specific public works. For the vast proportion of creative artists, then, GenAI programs are unhelpful or uninteresting. Artists taking up the gauntlet of GenAI often do so specifically to “lean in” to a twenty-first century technological challenge to their authorship, aiming to beat the tool at its own game or claim its aberrations as conducive to an aesthetic relation to shadowy memories, lost historical moments, or bizarre futures.33 They engage the algorithms by “correcting” results—an iterative, time-based, engaged, and intensive process.34 This correcting can also use nonalgorithmic methods, such as painting on images, embroidering on them, or critiquing them with text, shaping iterations for more desirable artistic results.35 Perhaps the most ambitious established artists work to craft their own unique training sets, write new code, or otherwise approach the algorithmic system as something to interrogate, shape, and craft.36 These are established artists with capacities to intervene in proprietary systems—how much more innovative art could be produced if the datasets, models, and interfaces let a broader public “tinker under the hood”?

Recommendations to enhance artistic innovation:

  • Produce publicly accessible resources to facilitate artistic access and creative engagement with existing models: large language models (LLMs), generative antagonistic network (GAN) models, and diffusion types, along the lines of the public consortium known as the National Deep Inference Fabric.37

  • Create pathways to bring artists and humanities scholars into the designing of tools and models, leaving them open to further public participation.

  • Fund the development of public standards, protocols, and accessibility options to improve transparency and encourage broader public use and a more creative, dialogical relationship to these tools.

  • Make “opt out” functions widely available to tag posted material as unavailable for future training sets, if so desired.38

Compiling visual material for GenAI datasets, designing and training models, fine-tuning programs, maintaining servers, conducting data audits and oversight, and monitoring outputs are all extremely labor- and energy-intensive activities. Millions of human hours have been spent already in generating digital images, scanning existing photographic, archival, and artistic materials, creating tags, and adding metadata. Humans assist machines in making ‘right choices’ from sets of pixels in relation to language and ambiguity. For example, people are needed to train algorithms to attribute and tag an elbow in an existing image, then must train the algorithm to assign only two elbows per human in generated imagery. Such tasks stem from an intuitive human relation to visual reality based on body knowledge that the machine does not have—it has only pixels and semantic tokens.

Training datasets (e.g., LAION 5B, Common Crawl, ImageNet, Flickr30k) for GenAI image models have been scraped from a seemingly vast corpus of personal photo and video posts, high-resolution museum object photos, historical archives, professional photojournalism, gaming animations, picture repositories, shopping websites, and artistic works created by humans, to name only the more obvious sources of privately produced and sometimes copyrighted or even illegal material.39 While we respect fair use and the benefit of having this freely circulating visual material widely available, there are significant challenges with unregulated, unacknowledged, and uncompensated uses of these images for commercially-controlled machine learning image tools.

Patent and copyright laws have evolved significantly in the common law tradition since Swift and Milton’s days. Today, there are greater attempts to regulate the consolidation and monopolization of artistic material by corporate entities while still protecting the fair access to visual images that feed parody, satire, creative appropriation, and transformative use.40 The policies and infrastructures we recommend navigate this delicate balance between recognizing the rights of creators to benefit from, and be recognized for, their artworks while encouraging the development of new and innovative visual artistic forms by communities newly gaining access to GenAI tools.

Copyright tussles over GenAI have been well-documented in the press and court dockets.41 Less acknowledged but acutely felt by many is the intensification of bias in generated images. Marginalized populations all over the world see the effects of an image corpus that has originated primarily in the Northern hemisphere, is dominated by English-language tagging, and stems from historical epochs that may have skewed and dehumanized colonial subjects through words and pictures. “The Orient” was a geography vaguely associated with the East; for nineteenth-century European tourists and explorers it became imbued with “Orientalist” tropes centered on the Islamic Middle East and North Africa (today known as the MENA region). AI has now converted “Orientalism” to a desirable pictorial ‘style’ based on compulsively exotic and erotic French academic paintings from the nineteenth century (Figure 5). Not only are the images one-sided, but data as such can be false or simply missing. One of the present authors, Gupta, argues in her paper “The Library of Missing Metadata” that humanities scholars are uniquely equipped to address this challenge. Cultural, historical, economic, social, and political information is either missing or has been stripped from the billions of ‘digital assets’ scraped into machine learning datasets.42 Metadata can allow transparency to sources as well as creative interventions and scholarly supplementation of these one-sided histories of the past.

Figure 5

Orientalist AI: “Orientalism” vs. 5.2 Midjourney style.

“All content in the Midlibrary catalog is generated by the Midlibrary team using Midjourney AI. We do not feature real artists’ images, artworks, or any copyrighted material in our catalog. The samples provided by Midlibrary are intended for educational and illustrative purposes only and are not representative of real artists’ works or real-world prototypes. Midlibrary is a non-profit initiative, not affiliated with real artists or authors, aiming to educate and inspire through the demonstration of the technology’s potential in creative explorations.” https://www.artvy.ai/ai-art-style/orientalism

Well-known compendia of ‘metadata’ included library card catalogs, organized in the United States via the flexible Dewey decimal system that proved useful for revealing how different kinds of knowledge could be organized into categories yet also accessible through a matrix of alphanumeric codes and descriptive information. Yet these systems themselves bore traces of colonizing forces in history. Metadata for digital assets, even when present, inherit the inadequate patterns of these ways of knowing. Disentangling and rewriting the metadata of knowledge and knowledge-production has long been the task of historians engaged with archival theory.43

What gets archived and what does not is shaped by state, colonial, corporate, or imperial institutions. To follow examples of prompts engaging the MENA region, this embedded bias is readily revealed, since visual documentation and representation in these regions is disproportionately representative of colonial pedagogies, religious missionary (Biblical) imaginaries, and historical military expeditions. “Fine-tuning” cannot fully address such imbalance, and promises of supplemental or synthetic data is likely to exacerbate the issue.44 Many alternative stories are simply missing, and scholars and students of Islamic art who input prompts set in the Islamic world into programs such as Midjourney or Stable Diffusion can readily identify how results are not just bad Surrealism; their pastiche forms recycle inflammatory tropes of the Orientalist kind.45 Such fabulated and stereotypical images worked then and now to foreclose the imagination of past, present, and future life in the Islamic world. The bias is compounded with thousands of digitized colonial photographs and more contemporary Islamophobic media representations that convert citizens of the region into caricature. Constructions of future image-worlds are fiercely constrained by sampling bias, labeling bias, design bias, and inferred prompts that reiterate the problem of digital colonialism and perhaps, as has been suggested, the “algorithmic gaze.”46

The MENA region is only one example. Narrative constructions of underdevelopment, poverty, barbarity, and lack of civilization don’t only appear in prompts set in the Islamic world but are also readily available for East Asia, Africa, Latin America, and other parts of the globe subjected to colonial powers, especially during the powerful image-generating nineteenth century. Such biased datasets feed into encyclopedic attempts and seemingly comprehensive projects of categorizing the human and nonhuman alike, but ultimately serve to continue patterns of control, hierarchies, and structures of domination from long ago. Constructions of the present and future cannot hope to escape our constructions of the past in GenAI, relying on prior imagery in recursive, generative processes. Humanities scholars will be critical in both surfacing and addressing the implicit biases in GenAI datasets, models, and interface design.

Moreover, the requisite skills for writing prompts in English obviously limit poetic nuance within the world’s diverse languages and their relations to visual forms. Results reveal how the recursive logic of GenAI is probabilistically sifting existing colonial, racial, linguistic, and gendered material; even contemporary materials are concentrated into subsets that are neither nuanced nor historically accurate.47 Fortified by “homophily,” or preference algorithms that are known to segregate users and datasets across the internet based on originary racist criteria,48 we can describe these as edge effects whose results are nonetheless not ‘edgy’ in their aesthetic forms. Human prompt engineering, post hoc, is not sufficient to correct these biases. Therefore, creating the best future for visual GenAI demands acknowledging bias and transforming existing datasets into richly annotated and pluralized community resources. Improvements in public access and interface design should give communities of diverse users new ways to shape and share visual training sets without segregating the results or inviting extraction, appropriation, and stereotype.

Finally, the archive must be expanded. A public archive developed for the Library of Congress (or more regional resources, such as the New York Public Library) could be built, engaging humanities scholars to supply metadata and identifying resources to be placed in trust for specific training-set needs of communities that participate. Existing entities (National Endowment for the Arts and National Endowment for the Humanities) should be given amplified funding for the new investment in digital humanities for GenAI, with artists and software engineers also invited (and compensated) for their public service. Because there will be one public repository (rather than scores of competing computational companies), energy use will be constrained and sustainable.49 Archiving must include the GenAI training sets themselves, already historical from 2023, as well as the periodically ‘updated’ and abandoned models and programs; future histories of this technological epoch and its cultural and political outcomes demand this care.

Recommendations for datasets and archives:

  • Existing datasets should be made transparent (at minimum in historical forms) as a publicly accessible archive.

  • Fund artists, humanists, public-service computer scientists and communities to engage with the dataset, amplify metadata, and create engaging art whose ‘edges’ teach viewers how the programs work and how we should critically understand their outputs.

  • As public cultural infrastructure,50 fund the development of a publicly owned, widely accessible, transparent, clearly labeled, and editable image dataset to allow the fullest scope of research potential for the public LLM inference tool kit (see the first set of recommendations above).

  • Ensure that the public dataset would be developed in collaboration with the creators of all existing data sets so their current work could be standardized and preserved as part of the public cultural domain.

  • This unified database would also allow copyright to be centralized and enforced through a digital equivalent of the ISBN system.

Figure 6

The Engine (version 12).

Screenshot from text-to-image video generated using Runway ML (duration 4 seconds). Matthew Ritchie Studio, 2024. The text prompt was adapted from Jonathan Swift’s fictional description of “the Engine” from 1726: “A single giant wooden machine, made of many small moving cubes of wood on wires, every cube has strange writing on it, with many students turning iron handles along the edges of the machine, in a large Neoclassical room.” During this four second sequence, the ‘students’ seem to become smaller, mobile components of the machine. Available at: https://vimeo.com/950859836/.

4. Conclusion: Fostering Public Intelligence

The platforms and media used to make GenAI art differ substantially from previous tools for art. Building on the world wide web, GenAI is potentially globally accessible. It is a cultural technology whose unprecedented scale (potentially millions of users), wide form of distribution (the internet), mode of engagement (machine cocreation, based on the most recent technology of video games) and multiplicity of outcomes (with multiple variants of each iterative prompting) are historically unique. The earlier technoshocks of printing and photography were both metabolized into culture over a significant period of time in lengthy reciprocal processes of making, critiquing, circulating, publishing, and editing, reshaping our literary and visual perspectives. We are already transitioning from a ‘published’ visual culture of material forms (painting, printing and photography, sculpture), grounded in diverse regional and cultural practices as well as chemical and material experiments, to a global, digitally based visual culture. GenAI contributes to these new modalities via access to instantaneous mass computing, homogenizing gaming software and screen-based images that are as yet barely theorized and rife with problems.

The vast scale of the GenAI challenge—its feverish speed of development and distribution, its rampant consumption of resources, and its monopolistic trends51—make the need for intervention to support more public intelligence urgent. What do we mean by “public intelligence”?52 We mean collecting more human (and humane) inputs in collaborative correctives to “artificial stupidity” and banal art.53 We mean increasing people’s understanding of how the tools work by inviting them ‘under the hood.’ We mean incentivizing artists and humanists to engage with the underlying language of this new visual culture, but through tinkering, asking edgy questions that trouble banal averaging and give us the artistic innovations we desire.

Our review of the perils and promise of generative technologies in the visual arts can be summarized in three domains that follow the ‘stacked’ logic of neural nets and machine learning protocols (Table 1). Our recommendations focus on how to shape these powerful, computational tools for the public good, touching on issues of artistic labor, aesthetic banality, and entrenched bias. None of these problems can be fully addressed without confronting the larger structural challenges of AI today, which include strains on the electrical grid and the allocation of massive computational power to the corporations marketing GenAI, as yet unavailable to US national research centers, universities, and even government agencies, much less the artists addressed here. Given the tremendous environmental impact54 and the cost of running overlapping datasets by each of the competitive businesses in the AI industry, it is imperative that a nonprofit public resource be developed through public-private partnerships and consortia that can mindfully address national and natural resources for future operations.

Our recommendations envision a holistically designed, publicly owned, and accessible GenAI system with publicly available datasets, publicly accessible inferential models, and a universally accessible public interface that can serve as an exemplary global standard. Archives of prior training sets and models, as well as abandoned versions of the software, can open operations to tinkering and reinvention as well as thoughtful histories of technoshock—all prerequisites for truly innovative visual art. Creating a single, globally accessible, energy efficient system will ultimately surpass and replace multiple privatized datasets. Ideally, this new protocol would be developed through public-private collaborations with the producers of existing datasets, standardized and preserved as part of the public cultural domain. The result? Enriched archives, enhanced metadata, fair use, and innovative art.

Table 1
Generative AI in the visual arts: problems, perils, and partial solutions.

Structures

Problems

Elements of a Solution

Dataset

Bias from colonial deposits, entrenched by recursive iterations, appropriated images, and algorithmic edge effects

Opt-outs, public transparency, archives of past training sets, and community inputs

Model

Concealed weighting, prompt engineering, proprietary concealment

Give public access “under the hood,” educate via projects

Public Interface

Uneven access, computational strain on the grid, environmental impact, costly subscription business model

Public tools, repositories, archives, small (and pooled) datasets, funding to improve access, artistic innovation, and public intelligence

Acknowledgments

The authors wish to thank project advisors Dr. Ziv Epstein (Stanford Human-Centered AI) and Professor Albert- László Barabási (Robert Gray Dodge Professor of Network Science, Northeastern University), as well as participants in the March 2024 workshop on this topic.  Interviews were granted to the authors by a wide range of critical thinkers and generative makers, their insights and recommendations are cited in the footnotes.

Comments
0
comment
No comments here
Why not start the discussion?