In 1967, a report commissioned by the US National Research Council to study communication in the behavioral sciences described an ideal information system as the “computer analogue of the available, intelligent, and informed colleague”:
[He] would read widely, have total recall, evaluate what he read […] He could respond to direct questions requesting information about facts and generalizations, data or documents […] He could take the initiative and stimulate the researcher by suggesting new ideas, facts, or literature of interest […] He might even help with the preparation of written reports of research or theory.1
I recall this fantasy not to show the prescience—or gender stereotypes—of earlier generations of information researchers but rather to suggest that dreams of automating the literature search have long been entangled with mechanizing scientific authorship itself. The recent use of large language models such as ChatGPT in scholarly research has simply made even more evident the close relation between the work of finding and summarizing past knowledge claims, on the one hand, and that of generating new ones, on the other.
Large language models may very well present a fundamental challenge to currently dominant cultural norms of authorship and attribution in the sciences. But understood as a kind of literature search technology, we can situate them within the long history of strategies and genres for condensing knowledge claims in order to make them usable for later investigators. From this perspective, they appear less as a revolutionary technology and more as a return to even older technologies and the cultural norms associated with them.
As ChatGPT came to prominence in late 2022, editorial boards of scholarly journals scrambled to clarify their authorship policies. Most of them addressed two questions. First, could a generative AI be an author on scientific papers? There was general agreement that authorship is a label that involves both credit and accountability, and insofar as AI cannot meaningfully be held responsible for its output, it cannot be an author. Second, how much to allow, and how to acknowledge, the use of AI in scholarly work? Editorial boards were loath to ban its use, but most agreed that any use of AI must be disclosed in detail by the human authors.
The swift and consistent tenor of these responses is not surprising: longstanding problems connected with multiple authorship, invisible academic labor, and ghostwriting had produced a robust discourse about the subtleties and limitations of the role of authorship in scholarly life. Notably, the now familiar idea that authorship ought to be replaced by contributorship statements—which, like film credits, allows for the decoupling of attribution and responsibility—has received renewed attention in the wake of ChatGPT.2
But these initial responses to the challenge of generative AI are just the first moves in a much longer game, and thorny questions remain. Key among them is this: what is the responsibility of researchers to determine if some argument or idea that they have derived from the output of a large language model ought to be attributed to previous—presumably human—researchers? In a sense, we have here the problem of IP laundering that has been raised in a number of other domains—such as fields in literature and music—and this is often presented as a problem connected with legal regimes such as copyright. So far, there is reason to doubt that copyright will be much of a check on the use of large language models in these domains.3
But issues of attribution within scholarly publishing have never been principally about legal frameworks such as copyright anyway. These are cultural norms of scholarly life that vary across discipline, time and even geography. Moreover, this aspect of attribution is as much about names in footnotes as it is about those in bylines. Beginning around the mid–nineteenth century, with the emergence of something like the modern scholarly journal format, scientific writing has in the main privileged short papers dedicated to presenting novel claims, along with at least some of the evidence for them. Other sorts of content, including summaries of the wider field to which the claim contributes, the history of work that led to the present paper, and acknowledgment of other authors’ contributions, have generally been confined to footnotes. In this system, individual papers formed their own archive, linked together through bibliographical citations.
An ideal of concision was the normative ideal that allowed things to hang together. Footnotes were not only a key means by which this concision was achieved, but the bibliographical references of which they were often composed were the basis for many of the most celebrated reference genres of modern science. To give one prominent and ambitious example, the Royal Society of London’s Catalogue of Scientific Papers was intended to gather up precise references to all scientific papers published during the century in the order of author names. Authorship mattered in this regime as an ordering principle, a means of generating epistemic authority, and a vector for navigating the literature.
But the nineteenth-century regime of concision was increasingly shadowed by a fear of too many journals and too many papers, with frequent claims that it was becoming more difficult to learn what was already known on a given matter than to discover it anew.4 Those fears (and most of the proposed solutions) persisted into the twentieth century, although they came to be reinterpreted by the 1940s as a crisis of scientific “information,” borrowing the term from business and state intelligence.
Around mid-century, a different regime of condensation came to the fore: compression. Inspired by mathematical definitions of information and its focus on efficient signal transfer, those interested in the problem of search began to imagine means of compressing and filtering the scientific record for easier “information retrieval.”5 Among the most consequential proposed solutions to the so-called scientific information explosion was the algorithmic filtering of the scientific literature by impact or quality. Eugene Garfield, chief promoter of the Science Citation Index, was inspired by information theory to argue that citation counts could be used as a quality information filter. In this way, the “scientific information crisis” could be shown to be largely an illusion. With the right algorithms, most published papers could safely be forgotten and excluded from the scientific literature.6
The tools used by Garfield and others to measure the impact of papers, people, and even nations by imagining the scientific literature as a network of directed links was a precursor to the search algorithms that made the web navigable through lists of pages ranked by relevance and importance. But for all that was changed by reimagining the scientific literature as a database of information, the nineteenth-century focus on authorship and attribution remained largely intact. Compressing the literature in Garfield’s sense not only preserved the genre of the scientific paper but also privileged authorship as the locus of authority, credit, and responsibility in science, not to mention the problems accompanying that concept. So central were these issues to his system that Garfield worked closely with sociologists of science such as Robert K. Merton and Harriet Zuckerman to better understand how names became attached—and eventually unattached in a process they called “obliteration by incorporation”—to knowledge claims.7
Large language models also have roots in the long history of information compression technologies. But instead of filtering and ranking the web to deliver the most relevant and impactful pages to readers, they treat the web—including whatever scholarly and scientific genres it might be able to access, whether legally or not—as a reservoir of texts, relatively independent of the genres that contain it. This massive corpus is used to produce the so-called model, far more compact but for that reason more useful and powerful. By identifying statistical regularities in the corpus, the model is then used to generate not only approximations of the text-based knowledge found in the original corpus (a kind of lossy compression) but—by filling in the gaps based on those statistical regularities—it can seemingly produce novel (if not always sensible) claims as well. Pursuing this analogy, writer Ted Chiang has called ChatGPT a “blurry jpeg of the web.”8
Among the various ways in which this technology presents challenges to scientific life, our old nineteenth-century notions of how to properly attribute ideas and claims are not easily accommodated by its use in research and writing. Genre and attribution have become so closely bound in modern science that any system that runs roughshod on genre is certain to mess with attribution as well. Text generated by ChatGPT might be indebted both to a vast array of previous authors and to no one in particular at the same moment, with no easy way to discern the details. This is obliteration by incorporation at a whole other level.
It is tempting to see this incongruity as a fundamental weakness of large language models when applied to doing—and writing—science. Perhaps. But given what we know about the aporias in scientific authorship, it is worth pausing to ask whether the nineteenth-century legacy of authorship itself might also bear scrutiny. That large language models do not reproduce exact phrases or documents, and generate content without consistent attribution, can be seen either as a weakness or a feature depending on one’s perspective.
Indeed, if we cast our historical gaze even further back, these aspects of large language models do have historical precedent. Before concision came to be prized during the nineteenth century, the more common form of condensation was the abridgement and the excerpt. Take the sixteenth-century European reference genre of the florilegium, or “gathering of flowers”: collections of the best or most important excerpts from previous works. Compilers such as Swiss physician Theodor Zwinger spent much of their lives putting together collections such as Zwinger’s Theatrum Humanae Vitae (1565) to preserve what they took to be worth preserving.
Authorial attribution, if not entirely absent, was much less essential to such collecting activity. Originality itself, in an intellectual culture that prized connections to ancient knowledge, did not play the regulative role that it later came to play. As Ann Blair has shown, these collections—and their more cavalier approach to acknowledgment—were often justified by their creators as making past learning easier and more efficient to use, as well as accessible to more readers.9
Such collections remained important into the early nineteenth century, and they overlapped in their aim and format with early encyclopedias. But their tendency to ignore distinctions of genre, and to muddy credit for original claims, eventually diminished their importance once scientific authorship came to be more highly prized. Even the modern review essay, which emerged with gusto during this century and bore some similarity to such earlier collections, was prized as much for its citations of authors as for its summaries of research.
By noting this likeness between large language models and past genres of knowledge, I don’t mean to imply that the former represent a welcome return to a past golden age before intellectual property and authorship gummed up the free flow of knowledge. But I do want to suggest that even in the sciences cultures of originality and invention have always proven quite malleable. If there is much to be genuinely worried about—not least that these new tools are largely in the control of profit-oriented corporations—it is probably not a bad thing for long-standing assumptions to be put under a new kind of stress. In this way, large language models may prove to be a tipping point in scholarly authorship and publishing. But I think we will find that many of the problems they force us to confront are not new, if only because the concept of scientific authorship has long been in such shambles. They point us back to past technologies and scientific genres as much as they do to the future. In the sciences, authorship has always been blurry.
Notes
Communication Systems and Resources in the Behavioral Sciences: A Report (Washington, DC: National Academy of Sciences, 1967), 46–47.
Robert Pennock, “AI and Responsible Authorship,” American Scientist 112, no. 3 (May–June 2024): 148. For an overview of some problems for scientific authorship well before ChatGPT, see Mario Biagioli and Peter Galison, eds., Scientific Authorship: Credit and Intellectual Property in Science (New York: Routledge, 2003).
For example, Micaela Mantegna, “ARTificial: Why Copyright Is Not the Right Policy Tool to Deal with Generative AI,” Yale Law Journal Forum 133 (22 April 2024): 1126–74. www.yalelawjournal.org/forum/artificial-why-copyright-is-not-the-right-policy-tool-to-deal-with-generative-ai
For example, Lord Rayleigh, “Presidential Address,” Report of the Fifty-Fourth Meeting of the British Association for…1884 (1885): 3–23, on 20. Later on, chemist J. D. Bernal regularly made similar observations.
On the history of compression in this sense, see Jonathan Sterne, “Compression: A Loose History,” in Signal Traffic: Critical Studies of Media Infrastructures, ed. Lisa Parks and Nicole Starosielski (Urbana: University of Illinois Press, 2015), 31–52.
Eugene Garfield, in a promotional film for the SCI titled “Putting Scientific Information to Work” (Philadelphia: ISI, 1967). See also Eugene Garfield, “Citation Analysis as a Tool in Journal Evaluation,” Science, 3 November 1972, 471–79.
For an overview of Merton’s notion of obliteration by incorporation, see Eugene Garfield, “The ‘Obliteration Phenomenon’ in Science—and the Advantage of Being Obliterated!” Current Contents, no. 51/52 (22 Dec. 1975): 5–7.
Ted Chiang, “ChatGPT Is a Blurry JPEG of the Web,” New Yorker.com, 9 February 2023, www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web, accessed 1 May 2024. Chiang’s characterization of ChatGPT is controversial, but critiques usually take the form, “it isn’t only that.” This uncertainty between compressing already existing content and producing new content is precisely where my interest lies.
Ann M. Blair, Too Much to Know: Managing Scholarly Information before the Modern Age (New Haven, CT: Yale University Press, 2010), 173–264.