Artificial intelligence (AI) deployed for customer relationship management (CRM), digital rights management (DRM), content recommendation, and content generation challenge longstanding truths about listening to and making music. CRM uses music to surveil audiences, removes decision-making responsibilities from consumers, and alters relationships among listeners, artists, and music. DRM overprotects copyrighted content by subverting Fair Use Doctrine and privatizing the Public Domain thereby restricting human creativity. Generative AI, often trained on music misappropriated by developers, renders novel music that seemingly represents neither the artistry present in the training data nor the handiwork of the AI’s user. AI music, as such, appears to be produced through AI cognition, resulting in what some have called “machine folk” and contributing to a “culture in code.” A philosophical analysis of these relationships is required to fully understand how AI impacts music, artists, and audiences. Using metasynthesis and grounded theory, this study considers physical reductionism, metaphysical nihilism, existentialism, and modernity to describe the quiddity of AI’s role in the music ecosystem. Concluding thoughts call researchers and educators to act on philosophical and ethical discussions of AI and promote continued research, public education, and democratic/laymen intervention to ensure ethical outcomes in the AI music space.
For nearly a century, research and development teams have pursued automatic computer generation of music. In 1956, mathematicians Martin Klein and Douglas Bolitho designed a computer system that could write Tin Pan Alley-style pop songs far quicker than any human composer-lyricist. Their program for the Datatron digital computer produced four-thousand songs in an hour, outpacing the composing abilities of any human songwriter, which was thought to be one song per hour. The most notable of Datatron’s output of music was “Push Button Bertha,” which was rejected by the US Copyright Office when it was submitted for registration. Their reasoning for rejecting the song was based in the belief that authors of creative works needed to be human (Bridy, 2016).
Additional computer synthesized music would premier the following year. According to Briot et al. (2020), Newman Guttman used Bell Laboratories’ sound synthesis program Music I to write a 17-second melody that he titled “The Silver Scale.” Also, Lejaren A. Hiller and Leonard M. Isaacson used the University of Illinois at Urbana-Champaign’s ILLIAC I computer to write a string quartet titled The Illiac Suite, the first example of notated music generated by a computer (Briot et al., 2020). Since these first attempts, several approaches to automatic music generation have been explored such as grammars, symbolic knowledge-based systems, Markov chains, artificial neural networks and deep learning, and self-similarity and cellular automata (Micchi et al. 2021).
According to Teuber (2007, p. 11), “An artificial heart is a heart if it does a heart’s job. Whether it be made out of plastic or organic matter, a heart is as a heart does… A similar point might be made about machines. A machine is thinking if it can perform the job that thinkers do.” Following Teuber’s logic, if it writes music, it is a composer, and artificial intelligence (AI) most certainly composes with commercial success. Clancy (2021) offers a comprehensive and extensive discussion of the financial and legal implications for corporate use of generative AI in the production of music. In his review of commercially viable generative AI programs, he lists the following companies and products: Iamus, Mubert, AIVA, Endel, Amper, AlgoTunes, Popgun, Amadeus Code, Boomy, Humtap, Ludwig, HumOn, WaveAI, Melodrive, LifeScore, AI Music, Xhail, and Jukedeck.
Melomics Media calls its Iamus system a “computer composer,” and Aiva Technologies likewise refers to its AIVA program as an “AI composer.” RIAA executive David Hughes says, “[AI] can create everything from scratch to finish for a sound recording. It might not be good, but the AI is there, and it’s already happening,” (Library of Congress, US Copyright Office, 2020, p. 185). University of Oxford professor Marcus du Sautoy similarly claims, “[AI] might well put second-tier composers out of jobs, those who are making their way by writing music for advertising, corporate videos, computer games. We’re already seeing music being written by AI which is ‘good enough,’” (Forbes Insights, 2019).
It stands to reason that musicking is no longer a quality belonging to humanity alone. It is a trait now shared with generative AI, causing rippling effects across the music recording industry and challenging the design and enforcement of copyright law. For example, a great deal of attention has been given to what Avdeeff (2019) calls the audio uncanny valley. It is a paradoxical and unsettling sense of surrealism instigated by the listener’s simultaneous perception of the human and the artificial. Avdeeff (2019, p. 137) claims that it “blurs the boundary between human and machine production, at times provoking wider fears about the future of human-technology relationships.” As such, researchers, technologists, and legislators have struggled to reconcile the coexistence of human and AI creativity. Clancy (2021), Drott (2021), and Sturm et al. (2019), for example, provide thorough discussions regarding the issue of copyrighting AI music, and the US Senate Judiciary Committee hearing, “Oversight of A.I.: Rules for Artificial Intelligence,” heavily dealt with the matter of transparency in instances where generative AI is utilized for content creation (CBS News, 2023).
Christians (2020) suggests that humanity possesses an underlying, primal fear of automation and emergent technology. Christians (2020, p. 3) argues, “The challenge technological societies face is not that from machines and products per se, but from a technological pervasiveness that erodes our language about [being human].” Given these concerns, it seems appropriate to conduct a philosophical inquiry into music technologies and the role AI plays in the music industry. Pulling from theories conceived by multiple philosophers such as Rene Descartes, Arthur Schopenhauer, Friedrich Nietzsche, Martin Heidegger, Jacques Ellul, and others, this article uses metasynthesis and grounded theory to probe how technology shapes the essence of music and how AI alters the landscape of the music ecosystem.
Philosophy and Artistry
Art and philosophy often find themselves as counterparts in the meaningful discussion of human nature. Devaraja (1959, p. 323) defines philosophy as “the instrument of the qualitative improvement of man as he expresses himself in his higher cultural activities,” and Priest (2006, p. 202) describes it as “that intellectual inquiry in which anything is open to critical challenge and scrutiny.” Meanwhile, Christians (2020, p. 16), commenting on Martin Heidegger’s Poetry, Language, and Thought (1971), states, “The arts illuminate the mystery of being. The artistic genre ‘opens new ways of saying [humanness],’” and Radnoti (1999, p. 45), quoting Novalis, calls art “the memory of humanity.”
Taubes (1993) asserts that art and philosophy are the means by which humanity explores itself, yearning to comprehend its ties to the world. Put otherwise, philosophy depicts the quality of humanness and humanity’s correlation with the environment, while art offers impressions of their conditions. For example, one may ponder and attempt to explain a person’s connection with nature, but art and music convey the profound emotions that a person feels for nature. Musical examples of such include Antonio Vivaldi’s The Four Seasons, Ludwig van Beethoven’s Pastoral Symphony, and Bedrich Smetana’s “The Moldau.” Each of these works transmits the composer’s subjective impressions while philosophical assessments aim for objective reasoning. Taubes further clarifies this relationship saying:
“Philosophy represents a body of knowledge that attempts to arrive at a valid explanation for the way things are… Art is something entirely different. It occurs. It communicates… Regardless of the message, all art has its own measure of truth to the degree of its communicability,” (Taubes, 1993, pp. 11–12).
To identify the substance of music, this inquiry considers Aristotle’s four causes that inform on the essence of all things. They are as follows: material, formal, efficient, and final. The Stanford Encyclopedia of Philosophy respectively defines each as: the material(s) used to create, the shape the fabrication takes, the source which elicits change, and the purpose for which it was fashioned (Falcon, 2019). Additional philosophical concepts considered in this discussion include episteme, techne, and poiesis. Each is respectively defined in the Stanford Encyclopedia of Philosophy as theoretical knowledge and understanding, practical or applied knowledge through craft, and a work resulting from “a rational, goal-directed activity of making,” (Parry, 2020; Paul & Stokes, 2023).
When interpreting audible music, the material is sound; the formal is the chosen instrumentation and combination of pitches and rhythms; the efficient is the performer; and the final is the reason for playing music whether it be entertainment, ceremonial, therapeutic, and so forth. A musical composition would follow the same process. The material is ink and staff paper; the formal is likewise the instrumentation and combination of pitches and rhythms; the efficient is the composer; and the final is the reason for which the music is composed, usually determined by either the composer or the patron commissioning the work. Understanding how music is made and used constitute episteme while the skill in performing with an instrument or writing a song represents techne. The performance or composition itself amounts to poiesis.
Reductionism
Each Aristotelian cause contributes to a higher order of representation; however, modern uses for music require practitioners and researchers to engage in reductionism. According to Rene Descartes (1701/1998), complex systems can only be fully understood by studying their basic components. He asserts:
“The whole method consists in the order and arrangement of the things on which the vision of the mind has to be focused in order that we might discover any truth. And yet we shall be following this method exactly if, step by step, we reduce complicated and obscure propositions to simpler ones, and we then try to ascend, through the same steps, from the intuition of the simplest ones of all to a knowledge of all the others,” (Descartes, 1701/1998, p. 99).
In alignment with Descartes’ method, Edgard Varese, who called his own music “organized sound,” explains, “Indeed, to stubbornly conditioned ears, anything new in music has always been called noise. But after all what is music but organized noises?” (Varese & Wen-Chung, 1966, p. 18). Immanuel Kant similarly describes music as “vibrations of the aether,” (Erraught, 2018, p. 17). They suggest that audible music can be physically reduced first to strategically ordered “noises,” then to reverberating air columns, and finally to oscillating gaseous particles. Spectrographs analyze these vibrations in manners of amplitude, frequency, and duration, rendering them as spectrograms that visualize the data. Varese predicted that music would advance, insofar as melodic, harmonic, and rhythmic innovations, to the point where staff notation would be insufficient. He claimed, “The new notation will probably be seismographic,” (Varese & Wen-Chung, 1966, p. 12).
Likely referring to the waveform renderings of a spectrograph, such representations of music have been leveraged in copyright lawsuits. According to Leo (2021), inquiries aided by music technologies have been part of copyright infringement cases for years, but lawsuits concerning fair use and de minimis, like those involving music sampling, especially depend on waveform and spectrogram analysis. She points to the following examples: Newton v. Diamond, Bridgeport Music v. 11C Music, Bridgeport Music v. Dimension Films, and Bridgeport Music v. Justin Combs Publications. Leo attests:
“Using specialized software… experts can reduce the speed of a recording to hear minute details, and they can use various techniques to isolate, manipulate, and even in some cases recreate, sections of recordings for the purposes of comparison… These analyses can also appear to provide forensic quantification and replicability that might simultaneously seem to mystify and to instill a sense of objectivity in factfinders, especially when introduced alongside, or in opposition to, notated corollaries,” (Leo, 2021, p. 121).
Spectrograms are also used in music emotion research. Studies conducted by Plut et al. (2022) and Grekow (2018) utilize graphing methods to plot specific emotion datapoints according to valence, arousal, and tension. Both studies surveyed people listening to music in controlled environments in order to measure emotional responses to each sample. Such findings can then be used to tag spectrograms according to reported emotional reactions. Grekow points to how combining music information retrieval with music emotion research improves sorting and discovery by including emotion as a search parameter in online music databases (Grekow, 2018). Plut et al. discuss how their study benefits the development of music generative software (Plut et al., 2022). For example, programs like AIVA require the user to input a variety of parameters, including an emotion descriptor, to produce music (Aiva Technologies SARL, 2018). By extension, this is also the basis for text-to-music generative AI like Mubert and Google’s MusicLM. According to Agostinelli et al. (2023), these programs analyze user-generated text prompts that look for keywords to match with tagged spectrograms. Then, they select and order the closest matches for each keyword and combine the spectrograms accordingly. Audio is then generated from the newly assembled spectrogram (Agostinelli et al., 2023).
Music data can also be rendered as machine code for computer instructions and AI training. There are many systems that rely on music as code in the field of music information retrieval (MIR) such as music notation software. Dubbed computer systems for expressive music performance (CSEMP) by Kirke and Miranda (2009), these programs approximate genuine instrument sounds and human performance from coded instructions such as MusicXML and MIDI files. Their study encompasses twenty-five years of CSEMP advancement and surveys a multitude of software developers and products, including well-known programs like Finale, Sibelius, and Notion.
Peter et al. (2023) explain the difficult and time-consuming nature of annotating note alignment between music scores and computer-synthesized or human performances. They explain, “Note alignments match individual notes in a performance to those on a score. Unlike alignments in audio files, which typically map each time point to at least one reference time point, note alignments match symbolic elements and can feature unaligned elements,” (Peter et al., 2023, p. 30). Precisely coordinating what is heard during performance with what is seen in a musical score requires highly specialized skills, but large datasets consisting of this kind of information are needed for training a variety of AI music programs. Peter et al. (2023) go on to demonstrate how they have developed a system that automatically conducts these kinds of note alignment tasks and use their model to create a high-quality note-aligned dataset comprising nearly one-hundred hours of solo piano music.
AI can also compose and arrange music as both written notation and audio. As previously mentioned, there are many commercial products that automatically generate music. Of the dozen or so programs listed, Melomics Media’s Iamus proves to be one of the most versatile as it produces novel music in mp3, PDF, MIDI, and MusicXML file formats, making it possible to not only have audio and print versions of the music but also enables users to manipulate the music using CSEMP’s (Quintana et al., 2013). Mairn (2023) demonstrates a novel use of ChatGPT to create coded music in C (programming language). Here, Mairn directs ChatGPT to write a computer coded piece of music by Stockhausen and then generates the audio in a sound/music program called Csound (Mairn, 2023). It is also worth noting that ChatGPT can generate short melodies coded in MusicXML. Cuesta and Gomez (2022) utilize artificial neural networks equipped with deep learning to analyze music audio tracks and automatically convert them to four-part a cappella. Bjare et al. (2022) demonstrate how their Markov chain model analyzes prewritten music to project the next sequence of notation, functioning as a kind of autocomplete or predictive text feature for musical composition.
AI has also been used to automate music analysis and song identification. Yust et al. (2022) demonstrate how their model, which relies on discrete Fourier transform and hierarchical clustering, automatically performs melodic and harmonic analysis. Their system examined seventeen movements from Mozart piano sonatas. Plaja-Roglans et al. (2023) describe the lack of considerations given in MIR research towards music of non-Western origins and develop an AI model that effectively aligns and extracts vocal notation in Indian Carnatic music. Berkowitz (2023) reports on how AI is used to detect piracy in videos uploaded to social media platforms. His study discusses the way automated copyright enforcement systems convert copyrighted music to strings of hash data for storage in a proprietary reference library that is used for comparison against user-generated content. His research includes legal analyses, surveys, and experimentation in his assessment of YouTube’s Content ID and Facebook’s Rights Manager.
Summarizing the Metaphysics of Commercial Music
Cleveland (1835, pp. 59–60) asserts, “Music begins where language ends; it expresses thoughts and emotions, to which speech can give no utterance; it clothes words with a power which language cannot impart.” Consideration is also given to Gordon (2022) who argues:
“Language is an edge technology that is interposed between man’s thought and the world… Language communicates information, commands and synchronizes operations between a multitude of agents… To be effective, the orders must be as unequivocal as possible… Such usage of language aims to reduce the agents’ internal space… Machine language is an operational language that utterly nullifies ambiguity and demands no interpretation. However, this is a reduction of language. Man’s language is essentially open and non-conclusive… Language presupposes that the speakers have different inner spaces and that they do not understand it in an identical way,” (Gordon, 2022, pp. 43–44).
If music represents a higher order of human expression and if machine language is a reduction of human language, then to what greater extent is music reduced when rendered as data and code? Sternberg (2007) encountered a similar unsettling notion concerning human consciousness. He questions:
“Thoughts must be products of the brain just as bile is of the liver and saliva of glands in the mouth… If my thoughts are reducible to physical processes in my brain, what would distinguish me from a robot built to process information and behave as I do? If not in my mind, where does my humanity lie?” (Sternberg, 2007, pp. 23–24).
Sternberg concerns himself with the higher value of human ponderance. Descartes, before him, declared, “I think, therefore I am,” in his Discourse on Method (1673), claiming that one’s own internal monologue is evidence of consciousness (The Editors of Encyclopedia Britannica, 2022). Ryle (1949/2009) takes this notion one step further suggesting that the brain’s physical processes produce sentience. He uses a university as an example, explaining that the individual colleges and departments along with their inner workings are what comprise a university; university as a concept is a label for the framework of the unified system. Consciousness is, therefore, what emerges from one’s physical complexities and functions (Ryle, 1949/2009). David Chalmers and Ray Kurzweil similarly believe that the brain’s organization and operations bring forth higher cognition (Sternberg, 2007). It follows from their conclusions that making significant modifications to one’s physicality would also impact one’s consciousness.
Similar explanations for the metaphysical quiddity of music have also been considered. Erraught (2018) poses the following possibilities: music must exist in a higher metaphysical state because it stirs the listener’s thoughts, emotions, and spirit; music is ineffable by nature and thus its qualities are unknowable; music has no meaning beyond its physical characteristics. Continued discussion assumes the first of these possibilities to be true.
According to Bonds (2006), art enthusiasts of the 19th century subscribed to the ideology of art religion as a way to commune with the divine, and composers at the time were revered almost like deities for their talents. Due to its intangible nature, instrumental music was especially held in high regard as the ideal medium for representing the immaterial and the indefinite. Bonds (p. 390) writes, “The public… began to see great composers as divinely inspired high priests of art who could provide glimpses into a loftier, more spiritual world.” Franz Liszt, unlike Varese, considered his own music far grander than mere “organized sound.” He described his music as “the embodied and intelligible essence of feeling, capable of being apprehended by [the] senses. It permeates them like a dart, like a ray, like a mist, like a spirit, and fills [the] soul,” (Grout & Palisca, 2001, p. 543). It is plainly evident that music embodies a higher order of meaning. It is a method by which humanity expresses itself, much like the dialectic in human thinking.
Schopenhauer (1818/1896) argues that music should not be restricted in the same way language is, asserting that music is by nature indefinite. He says, “If music is too closely united to the words, it is striving to speak a language which is not its own,” (Schopenhauer, 1818/1896, p. 338). His concerns, however, were ignored by companies developing autonomous composition programs that rely on textual inputs like Mubert and MusicLM. If melody and harmony are contorted in some way by conflating them with human language, then music in the form of machine language must greatly distort its metaphysical and ontological qualities in the same way Ryle, Chalmers, and Kurzweil draw a causal link between physicality and sentience.
Erraught (2018, p. 16) states, “Music has no—or virtually no—substantial materiality; while it may be preserved in material form, music itself disappears into each successive moment… The thing is that music is, in a distinct sense, constituted mentally. There is no remainder.” However, societal constructs like intellectual property law and, more specifically, copyright law force material qualities on music. According to Rae (2021, p. 1), “Copyright is a form of property, though unlike real property… it is ‘intangible,’ meaning that it isn’t something you can grasp in your hands.” Elaborating on this concept, Arditi (2020) explains:
“When a musician writes a song, the piece possesses no monetary value for which it can be exchanged – music in this sense is not a commodity… The song has no monetary value until it is considered property… However, property laws do not apply to the ownership of ideas contained in music because people cannot physically take a song. Rather, copyright intervenes to place a restriction on the reproduction and performance of a copyrighted song, thus allowing its commodification,” (Arditi, 2020, p. 50).
Taubes (1993) asserts that art and money are metaphysical opposites. The former is a representation of the indefinite while the latter is materialism incarnate. He goes on to say, “It takes no patience to understand the dollar value of art. It is simple, cold, calculable; it even provides a numerical scale. True aesthetic value takes a great deal of patience to understand,” (Taubes, 1993, p. 72). Composing and performing (i.e., transmission) emphasize the immateriality of music and, before copyright laws, were the only forms that could be monetized. It should be noted that composing, in this case, refers specifically to services rendered to a patron who commissions a piece of music. It should not be confused for, and is entirely different from, serial composition for commercial sale. The difference here is that the composer is paid by the patron to write music; the patron is not paying the composer for the sale of a composition.
The materialization of music, regulated by copyright law, has led to the creation of an industry predicated upon selling physical iterations of music, in both print and audio formats. Stiegler (2001/2011) describes such iterations of music as “tertiary retention,” defined as “the prosthesis of consciousness without which there could be no mind, no recall, no memory of a past that one has not personally lived, no culture.” He offers the phonogram as an example, saying, “It makes it obvious that, as the recording of a track on a material object, in this case an analog recording, tertiary memory inherently overdetermines the articulation of primary and secondary retentions, [respectively described as that which is perceived and that which is remembered],” (Stiegler, 2001/2011, p. 39). His analysis would suggest that audio recording supplants original performance as authentic rendition due to its permanence and accessibility.
Dewar (2016) provides examples, which follow, for how this has manifested in the music ecosystem. He reports that the music recording industry is committed to developing technologies that precisely replicate the original performance in music recordings. The earliest of such examples involves pairing an Edison gramophone with live vocalists on a stage where each would respectively play and sing the same song. It was an attempt at convincing audiences that they would not be able to distinguish one sound from the other. In a 1913 Victor Records advertisement, the company asserted that their opera recordings were in fact so high in quality that there is no difference between listening to a record and attending a performance at the Metropolitan Opera House. In the mid-1970s, a similar advertisement by Memorex, promoting its new tape cassette formulation, asks the audience “Is it live or is it Memorex?” The 2010s saw the same sort of advertisements featuring Ella Fitzgerald, Beyonce Knowles, and Ciara Harris in commercials promoting the audiovisual acuity of Vizio televisions and LG mobile phones (Dewar, 2016).
Erraught (2018, p. 51), however, argues that music copyright is “the crucial ingredient… necessary to support the autonomous composer and allow him his creative freedom.” While copyright may have granted musicians their freedom from laboring in accordance with aristocratic tastes and preferences, it hardly grants creative autonomy. Arditi (2020) provides thorough explanations for how music copyright and the music recording industry have in fact strictly limited artists’ creative autonomy across the board.
Generally, though, copyright makes it so that composers trade a handful of aristocratic patrons for the public, and exchanging one for the other allows composers to feel as though they work for themselves. This can be explained by Emmanuel Levinas’ philosophy on face-to-face encounters. Levinas (1972/2003) claims that when people meet face-to-face, each person becomes responsible for the other. Therefore, it stands to reason that composers who write music for aristocrats feel responsible for them because they have met and have come to know them personally. The same cannot be said of the masses that make up audiences. As such, a sense of responsibility for the “other” is lifted, and in its place, composers feel responsible for only themselves, which presents the illusion of creative autonomy. In reality, as listeners prioritize their own aesthetic interests and values, market pressures would dictate what composers should write in order to maintain viability—as it is with all commodities.
Nihilism
According to Meyboti (2016), Schopenhauer believes that a person’s sense of individuality and will is the source of misery. To him, it was as though people were fated to live empty lives lacking any sort of fulfillment. His contrived solution for this deep sense of dissatisfaction was to engage in nihilism, a method by which one reduces and even negates all meaning and consequences pertaining to existence. Schopenhauer found his means of escape in music due to its abstract nature, trusting it to relieve the burdens of listeners (Meyboti, 2016). According to Grout and Palisca (2001, p. 543), he “believed that music was the incarnation of the innermost reality, the immediate expression of universal feelings and impulses.”
While Schopenhauer found relief in music, Gertz (2018) shows how the hypnotic effects of technology offer humanity a variety of options for reducing their awareness of the world and by proxy, any meaning in reality and life. Nihilism of this sort is twice as potent as users are completely unaware that it is technology itself that causes this reduction in awareness (Gertz, 2018). Muller (2016) seems to share this perspective, saying that the way machines are utilized gradually displaces people’s involvement in human activities that encourages indifference towards the machines themselves and causes humanity to ignore its limitations. Similar traits reveal themselves within listeners, performers, and even music itself when music is reduced to data and code. The intrinsic values of music are distorted, and both musicians and audiences exhibit nihilistic behaviors.
Music inherits value-laden data when commodified. Streaming services and online stores, for example, track commercial circulation, customer demand, and distribution and sales metrics to gauge monetary value. According to Negus (2019):
“Musicians have found themselves redefined as content providers rather than creative producers… [thus] changing [the] artistic and economic value of recorded music… This debate [is] about the market and moral worth of music by exploring how digital recordings have acquired value as data, rather than as a commercial form of artistic expression,” (Negus, 2019, p. 367).
To track these data, corporations rely on AI programs called customer relationship management systems (CRM). According to Burkart and McCourt (2006), streaming companies such as Spotify and Deezer use CRM to follow how often songs are played through, skipped, repeated, liked, rated, and added to playlists, and online stores like Apple’s iTunes and Amazon Music similarly utilize CRM to track customer purchases, ratings, and reviews. These records enable companies to surveil customer behaviors and established habits, giving them intimate knowledge about users’ daily routines (Burkart & McCourt, 2006).
Drott (2018) reports that music streaming platforms use CRM to deduce the location, time, duration, frequency, and nature of routine activities by comparing and assessing users’ music selections, platform engagement, and user profile information. Drott uses showering as an example, saying that listening to music in the shower is a popular activity, and in recognition of this, Spotify hosts approximately 39,000 showering playlists on its platform. When customers listen to any of these playlists, they are unaware that they have indicated to Spotify that they are showering, and as this is a regular activity, Spotify tracks where, when, how often, and for how long all its customers shower (Drott, 2018). The same can be said for any routine activity (e.g., studying, sleeping, commuting to work, etc.). Unbeknownst to streaming music consumers, the music they listen to feeds the company’s surveillance system.
Users who are made aware of the ongoing dataveillance actively engage in nihilism. They devalue their human rights by exchanging privacy, and perhaps agency, for access to a cost-free service. The fact that they are being surveilled quickly moves to the back of their minds and fades away. Gunther Anders asserts, “The artificiality of human beings increases in the course of history, because humans become the product of their own products,” (Muller, 2016, p. 100). Here, Anders observed how consumers spend more time and resources on the maintenance of products rather than in the use of products; however, evidence of human artificiality can also be seen in how streaming companies sell their CRM data to third-party advertisers invested in improving market penetration (Burkart & McCourt, 2006). The chain of events presented here shows listeners forfeiting their privacy and agency to maintain access to streaming music which reduces themselves to products sold to third-parties.
Burkart and McCourt (2006) also discuss how streaming companies use CRM data to create individualized experiences for users by making music recommendations based on their past selections. Typically, music recommendation systems consider commercial success, popularity among users, and aesthetics (e.g., style, genre, mood, etc.) resembling documented user selections (Burkart & McCourt, 2006). This potentially poses consequences to the development of social ethics. To reiterate Taube’s point about the patience required to appreciate aesthetic value, Erraught (2018) states:
“While personal taste does solicit agreement, it does so within a framework that also maintains that taste should remain distinctive; if you were simply to like everything I like, because I like it, even though I have ‘good taste,’ this would not indicate the same on your part because you had not put in the effort to develop it. Taste reflects a process, a confidence that comes from deep and long-standing engagement with a type of music,” (Erraught, 2018, p. 21).
Cultivating the ability to appreciate certain aesthetics requires that listeners educate themselves on what they listen to in order to develop well-informed opinions. This demands significant time and effort, and therefore, it is not surprising that most consumers dismiss this pursuit by outsourcing decision-making to CRM-driven recommendation systems. Arditi (2020) believes that the current trajectory of streaming music, which appears to increasingly base music recommendations on the real time dispositions of listeners, will lead to services that remove decision-making responsibilities from users entirely. He goes on to say:
“If the data [i.e., geolocation, biometrics, and physical activity] cross-lists with social media, this information can be combined to judge our moods. I think we are on the cusp of deploying smartphones in a way that utilizes information to make music suggestions. In the future, instead of selecting mood playlists, Spotify or Apple Music will mine our data and our music tastes to guess our mood and suggest music. The attempt will be to offer us music before we know what we want,” or that we want it (Arditi, 2020, p. 160).
By demoting active engagement to a tedious chore, listeners opt for a passive relationship with their music. Doing so, however, also delegates decisions concerning human culture and ethics to AI systems. Erraught (2018, p. 48) defines culture as “the social cultivation of taste.” It follows that if society allows AI to determine aesthetic value, then AI will inevitably work towards shaping human culture. Similar effects would also be seen in how humanity prioritizes ethical values as aesthetics finds its origins in ethics. According to Taubes (1993, p. 23), “When primitive humans carved their idols and performed their rituals… they were taking a world they found inadequate… and reshaping it according to how it ought to be.” It, therefore, stands to reason that the design of any object, system, or service directly reflects the values of the designer. If AI is left to determine aesthetic values on behalf of users, then it also has the potential to determine ethical values for humanity. This point was a matter of substantial concern for those present at the “Oversight of A.I.: Rules for Artificial Intelligence” US Senate Judiciary Committee hearing as legislators pressed expert witnesses on AI’s role in propagating misinformation and disinformation (CBS News, 2023).
Drott (2020) discusses how the internet would supposedly enable all musicians to find their audiences and democratize the music industry; however, the online music market is oversaturated with droves of artists flooding digital platforms. The resulting surplus constitutes the music long tail (Drott, 2020). The combination of the music long tail and music echo chambers poses significant challenges to new artists trying to generate a following on music streaming platforms and social media sites. Senator Marsha Blackburn, during the US Senate Judiciary Committee hearing “Oversight of A.I.: Principles for Regulation,” mentioned how female country artists are marginalized on Spotify. She shared a conversation between her and country musician Martina McBride, saying:
“Martina McBride, who is no stranger to country music, went into Spotify… to build a country music playlist… She had to refresh [Spotify] thirteen times before a song by a female artist came up – thirteen times! So, you look at the power of AI to shape what people are hearing,” (CNBC Television, 2023a, 1:10:38).
Musicians experiencing difficulty in finding their ways onto recommendation lists generally wait for their popularity to increase as they continue to release new music, but some force their way onto recommendation lists by employing clickfarms to inflate their songs’ streaming metrics. These “digital sweatshops” take advantage of cheap labor present in local surplus populations in emergent countries and coerce clickworkers to engage in repetitive tasks such as “creating social media accounts, moderating content for platforms, clicking online ads, liking or rating items, and, of course, generating plays on streaming services,” (Drott, 2020, pp. 168–169). Senator Josh Hawly, during the aforementioned committee hearing, commented on this very issue. He points to a Wall Street Journal article reporting on OpenAI’s exploitation and abuse of foreign labor in Kenya to aid in moderating ChatGPT’s content generation. It is worthwhile to quote Hawly’s entire criticism which follows:
“We’re talking about a thousand or more workers outsourced overseas. We’re talking about exploitation of those workers; they work around the clock. The material they’re exposed to is incredible, and I’m sure extremely damaging. And, that constitutes an issue of lawsuits that they’re now bringing. Here’s another interesting tidbit: the workers on the project were paid an average between $1.46 an hour and $3.74 an hour. Now, OpenAI says, ‘Oh, we thought that they were being paid over $12.00 an hour.’ So, we have the classic-classic corporate outsource maneuver where a company outsources jobs… exploits foreign workers… and then says, ‘Oh, we don’t know anything about it.’ We’re asking them to engage in this psychologically harmful activity, we’re probably overworking them… and we’re not paying them… My question is: how widespread is this in the AI industry? It strikes me that we’re told that AI is new, and it’s a whole new kind of industry, and it’s glittery, and it’s almost magical. Yet, it looks like it depends, in critical respects, on very old-fashioned, disgusting, immoral, liberal exploitation,” (CNBC Television, 2023a, 1:32:23).
Following Friedrich Nietzsche’s philosophy, it is suggested here that employing clickfarms in such a way is an act of nihilism. Gertz (2018, p. 130) claims, “Cruelty was a right of the ‘masters’ according to Nietzsche, and thus even though the masters no longer exist, we still yearn to experience the power of the masters, even if it means being cruel not only to others but to ourselves.” Despite the situation, these musicians leverage their fiscal powers to exploit impoverished populations with the aim of increasing the popularity and value of their work. By doing so, they weave an expression of poverty-wage labor into their music while simultaneously establishing themselves as “masters” over clickworkers. Their nihilism leaves them blind to the severity of supporting clickfarms, which leads them to believe that the behaviors of the privileged are beyond reproach and/or the lives of the disadvantaged are insignificant.
Streaming platforms also incorporate digital rights management systems (DRM), and some employ automated copyright enforcement. As mentioned before, Berkowitz (2023) reports that YouTube and Facebook maintain proprietary reference libraries of copyrighted music and use these databases to automatically screen user-uploaded files for protected material. Any detection of unauthorized use of content is reported to copyright owners, and the flagged upload is usually automatically monetized, generating ad revenue for both the copyright owner and the social media company (Berkowitz, 2023). Berkowitz’s experiment, in which he uploaded computer-synthesized versions of Beethoven’s piano sonatas to YouTube and Facebook, shows how these antipiracy systems are unreliable at differentiating between copyrighted and freelance renditions of Public Domain music. He argues that YouTube and Facebook purposely allow for systematic allegations of copyright infringement, leveraging a technocracy which subverts Fair Use Doctrine and privatizes the Public Domain. In doing so, social media companies and corporate copyright owners capitalize on free labor (Berkowitz, 2023). In this case, corporations are seen utilizing music in a way which privately regulates and restricts human creativity and production of digital culture for the sake of profit.
Conditioning AI to automatically produce novel music also denigrates humanity as the sole proprietor of artistic expression, showing that AI may also be artfully expressive. Morreale (2021) equates AI-generated music to modern-day digital colonialism. He argues, “The cultural capital of individual musicians and communities is thus exploited by capitalist firms…” operating under the misconception that “everything can be possessed, exploited, occupied, invaded, and commodified,” (Morreale, 2021, p. 108). Developers, therefore, avoid transparency regarding the corpus of training data they use to build their generative music systems, and as a result, musicians are oblivious to the fact that their music has been used to train these programs, which leaves them uncredited and uncompensated for their work.
During the US Senate Judiciary Committee hearing, “Artificial Intelligence and Intellectual Property – Part II: Copyright,” Karla Ortiz, an artist employed in the film industry, exemplifies Morreale's (2021) point during her testimony. She claims that in preparing for the hearing, she learned that nearly the entirety of her work and the work of her colleagues have been “taken without consent, credit, or compensation.” She goes on to say, “These works were stolen and used to train for-profit technologies…” (CNBC Television, 2023b, 33:05). She laments the emergence of generative AI as she sees it as a threat to her longevity as an artist. In her words, “[AI] is a technology that uniquely consumes and exploits the hard work, creativity, and innovation of others. No other tool is like this,” (CNBC Television, 2023b, 32:37).
Ortiz states that she and her colleagues are due rights to consent, credit, and/or compensation. Rae (2021) simplifies the concept of copyright in the following example:
“Let's say you purchased a limited edition vinyl LP by your favorite band as part of Record Store Day. As the owner of a physical item containing music, you only have rights to the container, not the recordings and the compositions within. The music itself belongs to its respective authors and copyright owners…” (Rae, 2021, p. 1).
Copyright law imposes limitations on use as determined by the rights holder, and while a case for Fair Use Doctrine can be made in defense of using content for training data, it is a defense, as Leo (2021) points out, with inconsistent success and which heavily depends upon the case’s circumstances and the court hearing the case. Courts in the United States are currently being tested on these matters in Getty Images, Inc. v. Stability AI, Inc., a lawsuit in which Getty Images alleges that Stability AI violated copyright by using Getty’s library of images to train its generative model (Smith & Nicol-Schwarz, 2023).
Ortiz’s fears are exacerbated by the way in which generative AI is able to excessively outpace the productivity of human artists. As mentioned before, Klein and Bolitho’s Datatron computer program was able to produce four-thousand songs in a single hour (Bridy, 2016). Collins (2018), drawing on Klein and Bolitho’s SuperCollider coded algorithm for “Push Button Bertha,” re-coded their project in Fortran in order to dramatically hasten melodic generation. With this new model, he was able to produce one billion songs in just over an hour (Collins, 2018).
Anders (1962) characterizes modern humans as “inverted utopians.” He explains, “While ordinary utopians are unable to actually produce what they are able to visualize, we are unable to visualize what we are actually producing,” (Anders, 1962, p. 496). Humanity is, therefore, inventing without knowing the fullest extent and impacts of its creations. This accurately describes how one composes using AI. As mentioned previously, composing music is a mental activity. Erraught (2018), aiming to align Theodore Adorno’s and Kant’s theories of aesthetics, expands on this notion. He asserts, “The ‘silent’ listener, the person who can ‘hear’ music without it being audible, merely through reading the score… has access to that which is truly enigmatic about music,” (Erraught, 2018, p. 68).
Human composers, while they may go through drafts and revisions of their work, generally have a preconceived notion of the music they wish to create. Mozart, for example, “mentions works he has already composed in his head but ‘not yet written down’… And, unlike Beethoven a generation later, he did not labor through multiple drafts of a single idea,” (Bonds, 2006, p. 366). Although Beethoven would constantly revise his works, he could still hear his music even while going deaf. Keaton (2013) explains:
“For any musician, the real world of music exists not so much in the ears as in the mind, in the imagination. Musicians are trained to be able to look at a score and have some idea what it sounds like; Beethoven, one of the most supremely gifted figures of all time, had perfected this ability. So, his loss of hearing did not mean a loss of musical ability…” (Keaton, 2013, p. 133).
Although composers of the Second Viennese School and the American Ultra Modernists experimented with serial and aleatory composition as well as Schoenberg’s twelve-tone matrix, these were still methods based in human effort. Despite the fact that AI composers facilitate human control over music production, those relying on such systems do not know what music their prompted inputs will produce until after the fact. As such, AI use necessitates repeated adjustments to parameters in a process of trial and error until the program renders a satisfactory product.
During a panel discussion at the 2020 AI Music Creativity online conference, Paudie O’Connor, a judge from the 2020 AI Music Generation Challenge, which will be discussed later, mentioned that the listener or performer can usually recognize Irish folk tunes as representations of their indigenous regions and/or populations; however, this is not the case when considering music produced by AI. He is quoted here saying:
“I think that music almost needs to kind of connect with the people or where people come from, and you can feel that when you play a tune. You can sometimes play a tune without even knowing where it came from… You know by the style of the tune or the way that the tune pulls you into a certain style of playing. I found with an awful lot of the submissions I wasn’t getting pulled one way or the other. I found that I just couldn’t connect with the land or the people… You could tell that they were written by a machine. There was just some lack of connectivity, or I couldn’t feel what the composer felt when they wrote the tune… Every now and again, somebody writes a really good new tune, and you feel something. You often can make a fair stab at where that tune came from, or what style of music that particular musician plays. I found it very hard to make that connection with an awful lot of the tunes,” (AI Music Creativity 2020, 2020, 27:57).
Given this assessment of AI composition, it follows that AI music does not represent the talents of the musicians included in the training data, nor does it represent the musicianship of the program’s user. It is as though AI music does not represent human artistry at all. Perhaps this constitutes a form of music only made possible by the emergence of AI. Some researchers call this “machine folk” (Huang et al., 2023, p. 53; Sturm et al., 2019, p. 8). Crowdy (2022, p. 61) claims, “Important aspects of the culture around a sound lie in the code that sustains it. There is culture in code.” Perhaps what he alludes to is machine folk and its future iterations.
Existentialism and Modernity
Marx and Engels (1996) argue that industrialization has created numerous degrees of separation between a craftsperson and their end product. This is because multiple laborers are now involved in the act of making, and the tools and materials used no longer belong to laborers but rather to the business owner. Items of this sort, engendered through automation and mass production, are made not for use but for commerce, changing the very nature of the products themselves (Marx and Engels, 1996). Anders (1956/2016) likewise attests that most people are mystified and perhaps troubled by modern inventions as the inability to recognize human involvement in their construction leaves them feeling bewildered. He states, “Production processes are today made up of so many single and separate steps that workers who actually build the world of products and machines cannot see these fabrications as the fruits of ‘their own’ labor,” (Anders, 1956/2016, p. 33).
Heidegger (1954/1977) asserts that automation drastically alters that essence of invention and creation. He reconsiders Aristotle’s four causes and indicates that too much importance is placed on the efficient (practitioner). Items are no longer crafted; they are manufactured. Automation supplants the specialist by democratizing skilled labor and delegating these tasks to an assembly line of workers and machines. As a result, this process deemphasizes the craftsperson and elevates the final (purpose) in determining causality. He goes on to say that technology reveals a deeper, hidden meaning in whatever it is set upon due to its significant role in defining the final cause. This is because technology no longer works with nature, but rather, it challenges nature. He offers a dam with a hydroelectric generator as an example. The river, which the dam challenges, was once a picturesque formation punctuating the landscape. The hydroelectric generator, though, turns the river into a source of power, and optimizing that power source for energy production and storage is the true meaning of the river’s essence. Heidegger calls this process enframing, the challenging of nature by optimizing extraction from a resource for the sake of excess, and the excess he calls standing reserve. In his conclusion, he sees a future that is likely not only defined by but dependent upon systems of enframing and standing reserve (Heidegger, 1954/1977).
Following Heidegger’s example, recognition is given to how automation has altered the essence of music through its physical and metaphysical reduction. Music’s material, now data and code, is rendered as strings of alphanumeric characters that constitute its formal. The only efficient capable of interpreting and manipulating this iteration of music is a software engineer, and the final aims to train and instruct a digital system. Music can also take on the material and formal of spectrograms on a digital display. A content provider, a “master” over clickworkers, or perhaps even an AI program stands in as the efficient. The final proves to be something far less artful, serving instead to surveil audiences, train automated systems, enforce copyright, and/or generate revenue. As it mines human creativity to further its purpose, AI is like Heidegger’s dam, extracting excess quantities of music for the sake of profit.
Ellul (1954/1964) makes similar assertions regarding industrialization’s influences on knowledge and creation by drawing attention to the autonomous nature of human innovation. Ellul argues that industrial societies favor capitalistic values, such as efficiency, effectiveness, and pragmatism, over social values, like comradery and spirituality, which lead to self-defeating or evermore problematic outcomes. He calls this phenomenon “la technique” (Ellul, 1954/1964). Morelli (2021) describes “la technique” as a never-ending cycle in which knowledge of the way of things (episteme) leads to discovering more efficient and cost-effective methods of production (techne), which results in the excess of goods (poiesis) to be sold to the public at the lowest possible price. The resulting poiesis adds to the existing episteme, which advances techne, restarting the cycle and leading to a new poiesis. Evidence of Ellul’s “la technique” can be seen in the evolution of the music industry when considering changes in social behaviors, legal mechanisms, and technologies.
Arditi (2020) discusses what he calls the album replacement cycle, a phenomenon showing how people repeatedly purchase music that they already own. He describes it as a process by which consumers are convinced that the latest technological advancement produces improved products in anticipation of a superior music-listening experience. Dewar (2016), as previously mentioned, discusses a similar trend in music industry marketing tactics.
The following is a depiction of Arditi’s (2020) album replacement cycle. People who owned an album on vinyl record were persuaded to buy the same album on tape cassette because cassettes are portable and facilitate mobility. Those same albums on cassettes were bought again as CDs because they offered the additional advantages of uninterrupted listening (i.e., no more flipping to Side B), more reliable playback, improved product longevity, and higher quality audio. CD albums were then repurchased as digital files from online music stores so that they could be played on mp3 players, playback devices which hold many times more songs.
Consumers currently listen to those same albums via ad-supported and subscription-based music streaming services that alleviate the need for large digital storage on mobile devices; however, this completely separates the content from its point of access and likewise separates the consumer from the content. Digital media services and platforms cannot guarantee access to the content for which consumers pay. For example, Sony recently announced the dissolution of its licensing agreement with Discovery, meaning that all Discovery digital media content purchased by PlayStation users via the online PlayStation store would no longer be available for viewing, and although Sony did soon after agree to a new licensing deal, “the whole situation is a reminder that, in many cases, buying things digitally essentially means that you’re paying for a long-term rental,” (Peters, 2023, para. 3).
The advent of AI has also led to the development of automated copyright enforcement systems, such as Alphabet’s Content ID and Meta’s Rights Manager, which autonomously identify and respond to infringing content uploaded by users. As Berkowitz (2023) points out, these systems are unable to account for original productions of Public Domain music nor cases invoking Fair Use Doctrine. US copyright laws, according to Rowland (2020), were initially set in place by Congress to encourage creativity and production of culture, but Berkowitz indicates that enacting the Digital Millennium Copyright Act and deploying automated copyright enforcement systems led to a self-contradicting outcome in which copyright laws restrict creative expression.
Morelli’s (2021) rationalization of Ellul’s “la technique” naturally calls attention to how the current use of AI throughout the music ecosystem—the current cycle’s poiesis—may impact episteme and techne in the following cycle. Berkowitz (2023) alleges one possibility in his conclusion. He looks to an example in which Spotify hired composers to write soundscapes, sonically generic music meant to be listened to by consumers engaged in routine tasks (e.g., studying, sleeping, exercising, etc.). These composers and their music were promoted and popularized under pseudonymous artist profiles that Goldschmitt (2020), Drott (2021), and Morreale (2021) call “fake artists.” In doing so, Spotify demoted unaffiliated artists, which reduced their streaming metrics along with royalty payouts, and boosted its own ad revenue and generated more CRM data as more listeners streamed the company’s proprietary soundscapes (Drott, 2021; Goldschmitt, 2020; Morreale, 2021). This knowledge represents episteme.
Berkowitz (2023) also reports on how AI antipiracy systems identify copyrighted music in infringing uploads and respond to them by monetizing these videos on behalf of copyright owners. In addition, he notes that AI can easily generate sonically generic music and outpace the productivity of human songwriters. Pointing to Amazon’s Endel, Berkowitz explains that this program continuously autogenerates soundscapes in real time based on a listener’s biometric and geospatial data. These technologies represent techne.
Berkowitz’s theory concerning a technocratic paradigm aimed at controlling the production of digital culture represents poiesis. He suggests:
“Corporations like Meta and Alphabet operate social media platforms, employ automated copyright enforcement systems, and invest in generative AI programs. This combination of technology makes it possible for them to flood their websites with autogenerated music under the guise of fake artists. Furthermore, their ability to autonomously enforce copyright via their Rights Manager and Content ID systems may tempt them to store samples of autogenerated music in their antipiracy reference libraries. Doing so would allow Meta and Alphabet to leverage Rights Manager’s and Content ID’s monetization features to automatically appropriate ad revenue from content creators by claiming uploaded content similar to their own autogenerated samples,” (Berkowitz, 2023, p. 9).
A Summary of AI Music Competitions
Today’s technological paradigm has led to methods by which AI is used to generate music as seen in commercial music albums such as Taryn Southern’s I AM AI (2018), SKYGGE’s Hello World (2018), and Holly Herndon’s Proto (2019). In addition, Popgun’s Stephen Philips alleges that Roblox and Minecraft, popular web-based videogames, already feature AI-produced theme music (Dredge, 2017). For a more thorough review of AI applications in music composition, see Briot et al. (2020).
To explore the limitations and encourage innovation in the generative-AI music space, the KTH Royal Institute of Technology and Dutch public broadcasting organizations VPRO and NPO arranged and facilitated the AI Music Generation Challenge and the AI Song Contest respectively. Both are international AI composition contests with the former specializing in folk music and the latter specializing in pop music. The following is a summary of the creative efforts of research teams who participated in the last three years of competition.
The AI Music Generation Challenge
The KTH Royal Institute of Technology hosts the European Union-funded MUSAiC Project which aims to address concerns relevant to AI ethics in musicking (EU Publications Office, 2022). The project also includes funding for the AI Music Generation Challenge, an annual contest featuring AI-generated folk music. In 2020, the challenge theme was based on double jigs published in The Dance Music of Ireland: O’Neill’s 1001 (1907). The purpose of this contest was to “promote meaningful approaches to evaluating music; see how music AI research can benefit from considering traditional music, and how traditional music might benefit from music AI research; and facilitate discussions about the ethics of music AI research applied to traditional music practices,” (Sturm & Maruri-Aguilar, 2021, p. 2). The rules for the competition were listed as follows:
“Build a music AI that generates music. You can train your AI on anything, but remember that the results will be judged against the 365 double jigs in O’Neill’s ‘1001’. To facilitate judging, the music you submit must be rendered either as ABC notation, 5 staff notation, MIDI, or mp3-compressed audio files. Have your AI generate 10,000 tunes. Write a brief technical document describing how you built your system, presenting some of its features and outcomes, and linking to your code and models for reproducibility,” (KTH Royal Institute of Technology, 2020, p. 1).
All submissions underwent a random selection process in which five different songs per submission were assigned to each judge. Each set of songs were then evaluated according to a first set of criteria, which included presence of plagiarism as well as uncharacteristic rhythms, pitch ranges, and modes or accidentals, and then, a second set of criteria, which included melody, structure, playability on a traditional Irish instrument, memorability, and distinctness (KTH Royal Institute of Technology, 2020, p. 2).
Excluding the benchmark model, there were a total of six entrants for the 2020 challenge; however, only five of the participants revealed details of their approaches. Three of the models were trained on data originating from folk-rnn, a publicly available web application that autogenerates folk tunes. Each model’s description reads as follows:
“LSTM trained on folk-rnn data (B. L. Sturm, Santos, Ben Tal, & Korshunova, 2016), fine-tuned on double jigs in O’Neill’s 1001; folk-rnn (v2) with beam search, and ‘artificial critic’ (B. L. T. Sturm, 2021); Markov modelling in MusicXML, trained on a subset of O’Neill’s 1001; LSTM trained on thesession.org data, fine-tuned on double jigs in O’Neill’s 1001; LSTM trained on encoded MIDI; folk-rnn (v2) (B. L. Sturm et al., 2016) seeded with the start token and 6/8 meter token,” (Sturm & Maruri-Aguilar, 2021, p. 4).
Songs generated by the benchmark model (i.e., folk-rnn seeded with the start token and 6/8 meter token) and the Connacht research team (i.e., folk-rnn with beam search and “artificial critic”) were awarded prizes for their AI-generated double jigs.
The 2021 AI Music Generation Challenge featured Scandinavian slangpolskas. The majority of the rules and judging criteria remained the same from the 2020 Challenge; however, one notable change is that only four tunes would be selected at random by the judges with the fifth tune chosen by the entrants, likely showcasing their best tune (Sturm, 2023a). For the 2021 challenge, there were five entrants, plus the benchmark. The descriptions of their models are as follows:
“Folk-rnn fine-tuned on Swedish traditional music (Hallström et al., 2019) seeded with the start token and 3/4 meter token; Markov chain then genetic algorithm with fitness function based on music structure; Transformer architecture with templates derived from existing Swedish tunes and rejection sampling; Iterative elaboration of a template guided by principles of music theory; folk-rnn with beam search and ‘artificial critic’; Transformer architecture trained with Irish data, fine-tuned with Swedish data, and using rejection sampling (Casini and Sturm, 2022),” (Sturm, 2022, p. 3).
Three songs were selected to receive first prize and two were selected for second prize. Four entries of which were produced by the Vaxjo research team (i.e., Transformer architecture trained with Irish data, fine-tuned with Swedish data, and using rejection sampling) with the fifth submitted by the Smaland research team (i.e., folk-rnn with beam search and ‘artificial critic’) (Sturm, 2023a). While Vaxjo’s model proved very successful, it is worth noting that Smaland’s model was also used by the Connacht research team who won in the previous year.
The 2022 AI Music Generation Challenge was designed to be significantly different from the previous two years. Three competitions were presented as part of the challenge, and entrants could take part in any one or combination of the three. The first sub-challenge was to develop an AI that could generate Irish reels in the style of O’Neill’s 1001, much like the 2020 Challenge dealing with double jigs. The second sub-challenge was to design an artificial judge that could be used to assess the submitted reels from the first sub-challenge, and it must be able to “detect plagiarism, analyze the appropriateness of rhythm, mode, and accidentals, and grade structure and melody,” (Sturm, 2023c, para. 3). The third sub-challenge was to design an AI system that could generate titles for the reels from the first sub-challenge (Sturm, 2023c).
Given that the focus of this paper centers on AI-generated music, results from the first sub-challenge on AI-produced Irish reels will be discussed going forward. Please see Sturm (2023b) for an overview of the artificial judge and autogenerated titles sub-challenges. The rules for the AI Irish reels category of the 2022 Challenge remained the same from the previous year. Four research teams, plus the benchmark, participated, and the descriptions for their models read as follows:
“Folk-rnn (v2), seed with start and 4/4 meter tokens, filter by structure and pitch range; GRU-based language model, filter by structure; Variational autoencoder, filter by structure; Statistically informed recombination of material in O’Neill’s, as in EMI; folk-rnn with beam search (n = 2),” (Sturm, 2023b, p. 5)
Three prizes were awarded to research team Clare (i.e., GRU-based language model, filter by structure), team Limerick (i.e., folk-rnn with beam search, n = 2), and team Kerry (i.e., Variational autoencoder, filter by structure) (Sturm, 2023c). The 2023 AI Music Generation Challenge, which is currently in progress, seeks to curate songs for an AI music tradition, which points directly to theories like “machine folk” and “culture in code.” It is worth quoting the entire description here:
“One can see this challenge as a call for work to be considered for a future festival. The judges are ‘curators’, who are looking to create a compelling program of ‘music traditions’ generated entirely by, or with the assistance of, artificial intelligence. This future festival aims to delve deep in theoretical and practical questions of the application of artificial intelligence to culture, raising awareness of the many issues and dilemmas involved, from the economic and political to the technological and (post)humanistic. The curators seek to programme works showcasing a diversity of approaches and outcomes, and are especially interested in multi-layered work crossing material boundaries, all the while using artificial intelligence in some way or another. The curators are not necessarily looking for finished or complete work, but instead work that has a clear connection to the theme of the festival, showing evidence of deep reflection on the associated issues, and that can contribute to engaging and productive discussion,” (Sturm, 2023d, para. 6).
The AI Song Contest
The first AI Song Contest was announced at the 2019 International Society for Music Information Retrieval Conference and took place in 2020. Participants would be required to produce a radio-friendly (i.e., under three minutes) song in audio format using any AI systems trained on the Eurovision dataset provided by the competition organizers (Huang et al., 2020; Micchi et al., 2021). AI-generated songs would be assessed as follows:
“How was the provided dataset used? Has the song an interesting structure? To what extent have the melody, harmony, lyrics, and audio rendering been generated? The more elements are created with AI, the more points you will earn from the AI panel. Human interventions are allowed but this will cost you points from the AI-panel,” (Micchi et al., 2021, p. 265).
Sixty-one researchers, comprising thirteen teams from eight countries, participated in this inaugural competition. Teams generally took a modular approach, building on songs section by section and layer by layer, utilizing multiple AI models. Though there were some teams that attempted an end-to-end method, creating an entire song at once with programs like SampleRNN, they quickly realized how difficult it is to steer song generation (Huang et al., 2020).
A multitude of different AI models were used depending on what segment of the song was being developed. For example, teams used GPT-2, LSTM, and Transformer models for writing lyrics. CharRNN, SampleRNN, LSTM combined with CNN, LSTM combined with WaveNet, GAN, and Markov models were used to produce melodies. LSTM, RNN autoencoder, GAN, and Markov models assisted in generating harmonies. LSTM combined with CNN, LSTM combined with WaveNet, and GAN models helped write bass lines. For producing drums and percussion parts, teams utilized DrumRNN, Neural Drum Machine, SampleRNN, and Markov models. Finally, for vocal and instrumental synthesis, teams used WaveNet, SampleRNN, Vocaloid, Sinsy, Mellotron, Emvoice, WaveGAN, and DDSP models (Bolcato, 2020; Huang et al., 2020; Micchi et al., 2021).
How research teams combined the use of these models also varied. Those with advanced musical knowledge and skill, like Micchi et al. (2021) who earned fourth place, took musical content generated by the AI models and then manually assembled them much like a DJ combining samples to create remixes and mashups. Some took a pipeline approach in which the material produced by one model is fed into another model to produce an appropriately accompanying layer. For example, one team used LSTM-generated rhythms as input data for a CNN to create a melody. Others attempted joint modeling by utilizing more than one AI system which could autogenerate multiple parts at a time. Bolcato (2020), who earned tenth place, successfully designed a conditional RNN making use of several LSTM layers to concurrently produce lyrics and melodies; however, most teams found it difficult to align the outputs of such systems (Bolcato, 2020; Huang et al., 2020; Micchi et al., 2021).
Further still, there were different approaches to how musical content was selected. Some teams opted for mass generation of AI content, sometimes amounting to hundreds or thousands of samples, and from the amassed collection, they would pick what sounds most appealing. Others were more selective in their curation process, looking at each generated sample case by case before producing more content. The idea here was to use an appealing sample as the prompt for other parts of the song. AI curation served as an alternative to human curation for a minority of teams. In this case, a separate AI system was trained to recognize agreeableness between musical elements in samples. Those with the highest degree of aesthetic appeal were then saved for a human researcher to review (Huang et al., 2020; Micchi et al., 2021).
The 2021 AI Song Contest saw thirty-eight research teams from around the globe participate. The rules from the previous year remained mostly the same with the exception that the time limit for songs was increased to four minutes (Cousins, 2021). Methods and approaches, as depicted in the 2020 Contest, also extend to those who took part in the 2021 Contest.
Deguernel et al. (2022), who earned third place, also describe their approach to producing their AI song. They use a combination of machine learning and rule-based generative algorithms. To generate melodic and harmonic content, they use a factor oracle, which they describe as both a simple machine learning AI and a variable order Markov model. To determine the form and structure of the song, they utilized a constrained, first-order Markov model. Finally, mastering the song required human intervention before the team could apply an algorithmic auto-mastering software called LANDR (Deguernel et al., 2022). M.O.G.I.I.7.E.D., who earned first place, utilized a variety of AI models. GPT-2 generated the lyrics for their song, Melody-RNN generated the song’s melody, and a custom AI was trained on recordings of the team’s singer to synthesize the vocals (Cousins, 2021).
Unfortunately, research on the 2022 AI Song Contest appears to be scarce, and the 2023 AI Song Contest is currently in progress. However, descriptions of all participants’ creative processes in developing and training their AI models and the methods by which they synergize autogenerated content can be found on VPRO’s AI Song Contest website (https://www.aisongcontest.com/). In addition, the website features recordings of all song entries.
Conclusion
Automated musicking disrupts the music ecosystem and changes the essence of music entirely. AI technologies have altered music’s material, formal, efficient, and final, and automation distorts episteme, techne, and poiesis in musicking. Generative AI presents the strongest case for music’s mutations as a technology predicated upon total novelty. Radnoti (1999) criticizes society’s obsession with originality, saying that art is never produced ex nihilo or from nothing. He claims that all artworks represent the artistry of the creator and/or those who influenced the styles and techniques used throughout the creative process (Radnoti, 1999). Given that AI music is representative of neither the artistry present in training data nor the efforts of the user, perhaps then, AI cognition is responsible for creative work. If this is the case, research should turn to expanding on Crowdy’s (2022) “culture in code,” supporting a new field of study encompassing music and human-computer interaction, which he calls “code musicology,” thereby enabling a more thorough understanding of what some have called “machine folk.”
Technology may very well be the combined wills to power and profit (Hallett, 2015). With further democratization of artistry and automation of specialized skills, trajectories that value inexpensive mass production, to which AI is well-suited, show possibilities of humanity losing the ability to freely express itself through music. Feenberg (2018) recognizes that technology has historically been used to control populations but disagrees that humanity is fated to head towards such a future enmeshed in Heidegger’s systems of enframing and standing reserve. Instead, he advocates for democratic or laymen intervention by which social activists pressure corporations and lawmakers to find logistical and legislative solutions to existential problems.
Feenberg (2018) points to advancements made in the medical field throughout the late 20th century, especially during the time of the AIDS epidemic in the United States. At the time, there was tremendous distrust of the scientific medical community among patients (Feenberg, 2018). Feenberg explains that participants involved in clinical drug trials would often take their medications to labs for analysis and subsequently drop out of their programs if it was learned that they were given placebos. He goes on to say that patients objected to the restrictions placed on experimental medications, and they were dissatisfied with the design of clinical trials.
In order to reorient the medical community around more humane practices, scientists and physicians began consulting patient advocacy groups with positive outcomes (Feenberg, 2018). Feenberg credits democratic intervention for the formal recognition of patient needs as a right which led to renewed trust in physicians, ensured viable and sustainable clinical trials, and resulted in better medicine. Similar efforts can be and are being made toward ensuring responsible development, deployment, and use of AI systems in the music and entertainment industries.
During the “Oversight of A.I.: Rules for Artificial Intelligence” US Senate Judiciary Committee hearing, matters dealing with how generative AI functions and creates images, music, and news reports as well as AI regulation and transparency were all discussed. For example, Senator Marsha Blackburn showed tremendous concern regarding AI’s ability to generate new music and imitate existing musicians, mentioning the sense of apprehension towards generative AI held by executives and artists in the music industry. Senator Amy Klobuchar expressed her uneasiness toward AI’s ability to generate seemingly legitimate news reports. She worried that citizens may take a report generated by ChatGPT at face value without corroborating the story with a legitimate news source, thereby spreading misinformation about a variety of topics, including elections and public healthcare. As one of the expert witnesses, Sam Altman, former CEO of OpenAI, favored regulating transparency of generative AI use in content creation, saying, “I think it’d be a great policy to say [AI] generated images need to be made clear in all contexts that they were generated. And, you know, then we still have the image out there, but… we’re at least requiring people to say, ‘this was a generated image,’” (CBS News, 2023, 1:44:03).
Recently, the Writers Guild of America (WGA) was on strike for nearly five months, from May 2 to September 27, 2023. Their grievances were centered around generative AI, among other issues, impacting the work of writers, even displacing writers in the industry. Many of those striking were convinced that their efforts led to a more sustainable entertainment industry for all creatives in the face of emergent technologies (Alexander, 2023). In addition, the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) was on strike for about four months, from July 14 to November 9, 2023, over several issues, including a lack of safeguards against the use of generative AI, making it the first time since 1960 that both organizations simultaneously staged walkouts (Lawler, 2023). Other labor unions representing the interests of workers in creative media industries, such as the Directors Guild of America and the International Alliance of Theatrical Stage Employees, have publicly voiced their support for the WGA and SAG-AFTRA (KTLA 5, 2023).
Similar concerns over ethical development of AI systems used in the music industry have also been voiced by researchers in the MIR field. Huang et al. (2023) note that MIR research is severely lacking in approaches toward ethnic, racial, and cultural diversity. They present case studies in their argument for a philosophical shift in MIR methodologies. They urge MIR researchers to:
“Reflect on and fearlessly rethink the four ologies [i.e., ontology, epistemology, methodology, and axiology] concerning our discipline, strive for an equitable exchange with other disciplines and stakeholders, and let this process shape novel and epistemologically diverse research questions that go beyond building a technical problem around a dataset and criticizing (without balanced dialogue) the ‘data-poor fields’ for not showing interest in ‘upgrading’ and ‘catching up,’” (Huang et al., 2023, p. 54).
Holzapfel et al. (2018) discuss how MIR research is primarily driven by technical problem-solving, performance efficiency, and cost-benefit analyses but not ethical outcomes. They consider issues such as bias towards Western societies and art music, the music longtail, and music recognition and copyright. They go on to say:
“Most of our propositions demand a long-term engagement with ethics and require more research, dialogue, and discussion. We believe that such a process will increase the reputation of MIR as a mature scientific field, will lead to a more responsible treatment of the people who have a stake in MIR, and will be more respectful of the total social fact of music,” (Holzapfel et al., 2018, p. 53).
Morreale (2021) reviews the many ways in which AI has harmed humanity through its deployment in the music industry. He includes in his analysis dataveillance conducted by music streaming platforms, the perpetuation of clickfarms, and modern-day digital colonialism through the training of AI music systems. He calls upon fellow researchers and developers in the AI music space to think critically about ethical development and deployment of emergent technologies. He concludes:
“I argue that these discussions cannot be delegated to others as we are accountable for our work, the work we fund, and the recipients of the models we develop. By avoiding a deep engagement with issues like environmental costs, manipulation of listeners, and human redundancy and exploitation, we are implicitly stating that these are not priorities for research in [AI music] with the consequent acceptance of the status quo,” (Morreale, 2021, p. 110).
Libraries also possess the potential to assist the public in acclimating to a music ecosystem steeped in emergent technologies. Libraries already serve as the foreground for exposing the public to new technologies and, at this point, have done so for a century. Libraries have also strongly supported digital literacy and AI literacy by hosting classes and exhibits, forming partnerships, and investing in learning resources, and furthermore, library leaders strongly encourage librarians and researchers to make AI a central point in the professional agenda (Garcia-Febo, 2019).
Libraries have also historically advocated for civil liberties. For example, the American Library Association (ALA) strongly opposed the USA PATRIOT Act and especially Section 215 (i.e., the “library provision”) as this legislation allowed for the surveillance of patron activities in libraries, and the ALA showed strong support for net neutrality protections, citing concerns regarding unequal access to information and obstructions to intellectual freedom (Clark, 2019; Fiels, 2003). Also, according to Burkart and McCourt (2006), libraries strongly advocated for public rights concerning copyright limitations. They note the following:
“Consumer rights are not considered in DRM technologies, and public policy here… does little or nothing to prevent potential abuses by transnational corporations… Librarians were among the first professional groups to fight for preserving traditions such as First Sale Doctrine and fair use, which have been ignored by the [Digital Millennium Copyright Act] and eviscerated from public life today,” (Burkart & McCourt, 2006, p. 118).
Given these activities, music researchers and musicians in general should feel empowered to rely on libraries to learn more about AI music technologies. Further to this point, as public demand rises for AI music technologies, libraries may feel compelled to meet this need by partnering with researchers and bringing emergent technologies into library spaces. For instance, MIT’s Lewis Music Library partnered with Andreas Refsgaard, an AI music researcher and educator, to develop and host exhibits featuring interactive music technologies (Fay, 2023). Considering the significantly increased and recent attention garnered by generative AI, library organizations, like the ALA, may find themselves advocating for civil liberties once again.
Researchers and educators are encouraged to continue their engagement with AI systems in an endeavor to increase AI literacy and broaden AI ethics in all fields but especially in the visual and performing arts. Continued research in philosophies of technology as applied to AI aid advocacy groups in articulating their positions, inform technology developers on ethical oversights in the design of their systems and products, and assist legislators in understanding the technical and ethical underpinnings of complicated subjects such as AI cognition, human-AI co-creation, the uncanny valley, and AI agency and personhood. Improving AI literacy provides safeguards against irresponsible use of AI and manipulative technocratic systems, and educating the public ensures that AI tools benefit society without hindering human activities. AI has the potential to exploit the masses, leveraging technology to overpower the public; however, through continued research, advocacy, and public education, AI can enhance everyday life and allow humanity to flourish.