This is the fifth in a series of blogposts by the members of the Expanding Autonomy project, funded by the Arts and Humanities Research Council.
by Neil Levy
AI is rapidly being adopted across all segments of academia (as it is across much of society). The landscape is rapidly changing, and we haven’t yet settled on the norms that should govern how it’s used. Given how extensive usage already is, and how deeply integrated into every aspect of paper production, one important question concerns whether an AI can play the authorship role. Should AIs be credited, in the same way as humans might be?
While the criteria for authorship are themselves not entirely settled (and differ from discipline to discipline), it’s clear that AIs like ChatGPT are already playing roles that were they played by humans would qualify them for authorship. In the sciences in particular, knowledge production is distributed across a large number of people, who play different roles. One person might come up with the initial hypothesis (though others might earn co-authorship by refining it), others might be involved in data analysis or data gathering, and yet others might chip in on the interpretation of the data. AIs already are playing some of these roles.
Just for a single example, AIs are currently asked to generate objections or rebuttals to arguments that the (human) authors intend to make, and even to generate rebuttals to these rebuttals. Sometimes they do this task well. If a human were to do these tasks, they would earn co-authorship in some contexts. So why shouldn’t large language models like ChatGPT or Claude count as authors, too?
COPE – the Committee on Publication Ethics – is perhaps the most influential body attempting to promulgate appropriate ethical standards in academic publication. COPE’s position is that AI may be used in academic research leading to publication, so long as the (human) authors are transparent in their manuscript how the AI was used. But authorship is closed to AIs. “AI tools cannot meet the requirements for authorship as they cannot take responsibility for the submitted work.”
What is it to take responsibility for submitted work? Presumably, COPE has in mind some sort of answerability. To take responsibility for the work is to be able to answer for it, to be in a position to assert its conclusions and to stand behind them as generated by reliable methods. Those who take responsibility for submitted work are those who deserve credit for its findings and its methods, and perhaps blame if it is shoddy. They are also the people who are legally liable if contains plagiarism, libel, or in some other way falls short of legal standards.
It’s true that AIs can’t take responsibility for submitted work. But in contemporary science, it’s normal for co-authors not to be able to take responsibility for the paper. Typically, there’s a lead author or a small group of leads, and they are able responsibly to assert the conclusions, and they are liable for its failings. But many other authors are not in this position. They may not even have much understanding of the hypothesis being explored: their expertise might be in data analysis.
It gets worse. Even those who are best positioned to take responsibility for the paper may have little idea about some of the analyses. They relied on other authors to conduct that part of the research, and may lack the expertise to assess it for themselves. Sometimes, there’s no one who can really take responsibility for the paper. Quill Kukla has pointed out that sometimes ghostwriters seem no worse positioned to take responsibility for a paper than the real authors might have been.
But if human authors are often not able to take responsibility for a submitted paper, then why should we impose that requirement on AIs? At this stage of their development, it seems appropriate to demand that papers have some human authors, but given that AIs are contributing in the ways that would earn them authorship, we should acknowledge that. At any rate, the capacity to take responsibility is not required for authorship for humans, so it seems arbitrary to require it of machines.
Responsibility appears as only one aspect of the use of presented material within the issue of the use of AI material.
During the original development of AI extensive use was apparently made of big data to finesse the algorithms. Given that computing, like many areas of development/progress within the human sphere, to a very great extent rely upon feedback loops, it would be strange indeed if the original drive for the use of live big data to create a form of feedback loop is now suddenly dropped as a necessary requirement as things move forward. Do the credits regarding involvement in authorship create a feedback loop, responsibility or do they feed both? Those two (responsibility/feedback loop) use the same data in different ways.
Removing the potential for those feedback loops, or narrowing the possible breadth of their potential would greatly weaken that mechanism for the human audience, leaving more opportunity for unforeseen (by whoever the developers are) consequences. And it would seem the abilities of other authors involved, or any audience, to make properly accurate or informed comments would be significantly reduced.
In the area used in the example (media), how would any appropriately informed feedback loop be maintained by the audience involved, if any indication of involvement by AI in authorship is missing? Lacking indications of authorship, clearly any audience providing feedback to a media outlet could be provided with a widely used ready made defence to the responsibility element, we relied upon the expertise of the AI it was the computer, the complexity of AI makes it more knowledgable than us. Pen names have been used widely by the media and yet no consideration of that method for AI’s is mentioned.
Within the media this broad issue (not crediting) appears as completely different to maintaining the integrity of anonymity for sources in vulnerable scenarios, appearing, more as considering something which actually requires further action, to be completed, because it has been entered in (or accessed from) a computer, removing further responsibility from the originator/user, whilst at the same time creating demands for additional computer resources as a means of protecting any avoided feedback loop/responsibility. Looking to the origin (and hence true reasons behind) avoidance issues and there would be increasing needs to track any reasoned decision point or diversion of resources achieving a new direction, before making the more valuable inquiry regarding any truly considered reason why.
A very pertinent point in crediting authorship that is mentioned is the context and depth of use of AI in any presented work. From the description given of current/past practice, that appears to have been often fudged, but it will no doubt require accurately addressing within the credits of claimed work in some way, as moving forwards that (something which on the surface appears quite odd) becomes more pertinent for free societies, not merely to leverage responsibility, but to provide for accurate feedback assuring a broader, accurate, and responsive progress.
Neil. Sorry for the late posting but I have been very busy. Not sure this makes much sense as it is somewhat over-compressed.
The “definitions” of what authors are and the responsibilities they and their publishers carry are not clearly defined in both statute and common law in this country. (I will use “author” to include composers, artists, scientists, journalists, photographers, etc..) There is a good deal of international agreement between the UK, US and other nations that adhere to the economic incentive model of intellectual property rights. However, the moment we cross the Channel we immediately encounter the Romantic concept of the “moral rights of the author” and the further East we go the complexity of these issues grows. Philosophers have played a major role in this process but have for the most part not been so interested in the IPRs of inventive engineers and their machines.
When I approached how the IPRs and machine intelligence (aka AI) developers, owners and users should be defined and regulated, and how MI could alter our concepts and understanding of authorship, publishing, fair use, commons, etc., I soon began to realise that the then stochastic “connectionist” machines that used mass data processing would – when they were commercially realised – make plagiarism an industrial process. Now that generative AI machines like LLMs are operational and being trained on vast amounts of data that is being treated by their developers and owners as a common or ‘fair use’, it is clear that the legislation that defines authorship, its responsibilities and the IPRs of those that develop and own LLMs machines needs to be radically changed.
The notion that a signal processor like an LLM or any foreseeable so-called AI machine could be treated as an author, as Turing realised, ‘is too meaningless to deserve discussion.’ (We could for the sake of brevity work around Foucault’s use of ‘compiler’ in his ‘What is an Author?’ and not its meaning in computing.) At present, LLMs mostly assemble their input data from the commons, ‘fair use’ and IPRs human authors have created by using stochastic iterative techniques that generate output that appears to be reasonably meaningful and similar to the original *situated* human authors’ historical data output. As expected, some of the output from these machines is nonsense and some of it is better than the average output from original authors.
In short, as you appear to accept at the end of your post, LLMs are machines and it should be quite simple to keep them excluded from becoming copyright holders (there are some who believe machines should be holders of such rights). Indeed, it should be relatively easy to exclude machines like computers from copyright law because they are – including the software they operate on – machines which should fall under patient law. Unfortunately, because of some very stupid court rulings and legislation, software has been deemed to have copyright protection on what amounts to be little more than it is “written” and machines “read” it! This situation has been further complicated by generative AI machines which are trained on a variety of data that are protected by copyright or other IPRs. Developers and owners of these machines want to treat all training data as commons or at least ‘fair use’; but are not so keen to relinquish their rights on the “data” outputs of their own machines. They may be providing it for free now, but, as with much else that was once free on the WWW and elsewhere, it will eventually be “monitored” by Big Tech.
This brings us to responsibilities. The developers and owners of generative AI machines do not want to give up their (future) IPRs on the output of their machines, but they are also not too keen on taking responsibility for the output of their machines. We do not accept the developers/operators/owners of other machines are not responsible for their machines. It is the developers/operators/owners of “computers” that have been legally protected from their failure and errors. (The Fujitsu/Post Office scandal is a recent example of this protection.)
I do not accept your suggestion that just as co-authors do not take responsibility for academic papers, machines should be treated as being “co-authors”. If we use this as an argument that the machine’s developer/operators/owners are not responsible for the output of their machine, it could be used as de facto acceptance that co-authors and indeed authors have no or little responsibility for their input to papers. This in turn could weaken the responsibility of co-authors and authors to check the quality and veracity of any output by a machine they use (this would include the use of all machine output at this level). Given the relatively high level of false and misleading information that signal processing machines like LLMs generate, this arrangement would ultimately have a deleterious effect on the whole system. We should be trying to increase the quality and veracity of papers (i.e., the signals the machines are trained on and fed back into the system), not introducing yet another means by which authors can get substandard and fallacious papers published.
It is, of course, the publishers who are ultimately responsible for what they publish. Again, by strengthening their responsibilities by ensuring, for example, that any machine-generated output they publish is identified or labelled, we might be able to reduce the amount of false information or “noise” that is being generated by these machines (and authors) from being recycled not only in publications but also by Big Tech, i.e., social media and other information not covered by publishing legislation.
Sorry, my phone has overstepped the mark – it could be “monetised” not monitored.
Comments are closed.