Skip to content

AI As A Writing Tool: Great Benefits, Major Pitfalls.

Written by Neil Levy

Large language models look set to transform every aspect of life over the coming decades. Some of these changes will be dramatic. I’m pretty unconcerned by the apocalyptic scenarios that preoccupy some people, but much more worried about the elimination of jobs (interestingly, the jobs that seem likeliest to be eliminated are those that require the most training: we may see a reversal of status and economic position between baristas and bureaucrats, bricklayers and barristers). Here, though, I’m going to look at much less dramatic, and very much near term, effects that LLMs might have on academic writing. I’m going to focus on the kind of writing I do in philosophy; LLMs will have different impacts on different disciplines.

A number of academics, writing in academic journals and on Twitter, have suggested that LLMs could be used to streamline the writing process. As they envisage it, LLMs could take on the burden of writing literature reviews and overviews, leaving the human free to undertake the more creative work involving the generation and testing of hypotheses (here, too, though, the LLM might have a role: it could generate candidate hypotheses for the human to choose between and refine, for example).

As a proponent of what we might call extended cognition, the general idea is one to which I’m sympathetic. The extended mind hypothesis is a metaphysical claim: on this hypothesis, mind can extend beyond the skull and into the artifacts that enable certain kinds of thinking (my smartphone might partially constitute my mind, when its reminders, navigational capacities, search functions, and so on, are sufficiently integrated into my cognitive activities). The extended cognition hypothesis is agnostic about metaphysics: it simply emphasises the degree to which our thought is offloaded onto the world, including artifacts. New technologies enable new kinds of thinking, and this has always been true. As Richard Feynman said, notes on paper aren’t merely a record of thinking, “not really. It’s working. You have to work on paper, and this is paper.”

Extending cognition through new technologies opens cognitive horizons that are otherwise inaccessible to us. Supercomputers that perform millions of operations per second allow us to analyse data and perform mathematical calculations that were utterly closed to previous generations. But in opening up new horizons, new ways of extending thought can make others less accessible and have unwanted impacts on our native cognition. In The Phaedo, Plato expressed the fear that writing would undermine our capacity to remember things. He may have been right about its effects on our memory, but that’s more than compensated for by our increased capacity to record things externally. There are no guarantees, however, that changes will always be for the better.

The idea of a division of labor between the relatively routine and the creative imagined above, with the LLM taking on the first and the human (alone or in collaboration with the LLM) the second, is not unattractive. It can be tiresome to review a literature one already knows well. Sometimes, I find myself in the position of having to rewrite pretty much the same points I’ve made in a previous paper in an introductory section. It’s only norms against self-plagiarism that prevent me from cutting and pasting from the older paper to the newer one. Allowing the LLM to do the work of rephrasing is a tempting option. We might think that whatever other costs and benefits they have, getting them to do what we the drudge work is surely an unalloyed benefit.

Perhaps – perhaps – it’s a benefit overall, but it’s not an unalloyed benefit. While we may approach a paper with a hypothesis in mind, and think of the introductory sections as merely sketching out the terrain, the relationship between that sketch and the meat of the paper is not always so straightforward. Sometimes, in rephrasing and summarizing ideas that I thought I already knew well, I discover relations between them I hadn’t noticed, or a lack of clarity that hadn’t struck me before. These realisations may lead to the reframing of the initial hypothesis, or the generation of a new hypothesis, or simply greater clarity than I had previously. What I took to be mere drudge work can’t be easily isolated from the more creative side of thought and writing.

More generally, the drudge work lays down the bedrock for creative activity. If I had never attempted to review and synthesise the work that appears in the review section of a paper, I wouldn’t know it well enough to be able to generate some of the hypotheses I go on to explore. That drudge work is an essential developmental stage. It’s also a developmental stage for a set of skills at navigating a terrain. This is a generalizable skill, one we can apply in future to different material and different debates. It may be that those who have already developed such skills – those who became academically mature before the advent of LLMs – can outsource drudge work at a smaller cost than those who have not yet developed this set of skills. Perhaps doing the task for oneself, boring though it may be, is necessary for a while, before we throw away the ladder we’ve climbed.

I’ve got no doubt that LLMs can and will be incorporated into academic writing, in ways and with effects we’re only beginning to imagine. Externalizing thought is extremely productive: it’s always been productive to write down your thoughts, because externalizing them allows us to reconfigure them, and to see connections that we mightn’t otherwise have noticed. The more complex the material, the greater the need to externalize. LLMs allow for a near instantaneous kind of externalization: we might regenerate multiple versions of a thought we’ve written once, and the permutations might allow us to see new connections. LLMs can also be used to generate new candidate hypotheses, to identify gaps in the literature, to synthesise and visualise data, and who yet knows what else? Perhaps the day will come – perhaps it will even be soon – when AI replaces the human researcher altogether. For now, it’s a powerful tool, perhaps even a partner, in the research process.

Some of those who have worried about the singularity – the postulated moment when AI design takes off, with ever more intelligent AIs designing even more intelligent AIs, leaving us humans in their dust – have proposed we might prevent human obsolescence by merging with the machines, perhaps even uploading our minds to artificial neural networks. I don’t know whether the singularity or human obsolescence are real threats, and I’m very sceptical about mind uploading. Whatever the prospects might be for mind uploading, right now we can integrate AIs into our thinking. We may not stay relevant for ever, and we may never merge with the machines, but right now they’re powerful tools for extending our cognition. They might homogenize prose and lead to a loss of creativity, or they might lead to an explosion of new approaches and ideas. They’re certain to have unanticipated costs, but the benefits will probably be much greater.

Inevitably, I ran this blogpost through an AI tool – the free version of Quillbot. It identified one or two typos, which of course I corrected. It also made a number of stylistic suggestions. I accepted almost none of them, but several led me to think I ought to rephrase the passage. Perhaps that’s not a model for how AI might be useful for writing right now.

Share on

16 Comment on this post

  1. Correction: Free version Quillbot — could make a significant difference. MS Word has more robust AI now…

    1. I’ve used only the free ChatGPT, Bard and a few image generators. I’ve been impressed by GPT’s ability to summarise things: I could outsource writing summaries to it (some journals don’t allow it, but they won’t know!) Given the hallucination problem, it can only be relied on when the person is able to check it. Of course that problem is likely to be solved some time soon.

  2. Paul D. Van Pelt

    I have launched an opinion or two on AI before and those have been either scoffed at or soundly rebuked. I don’t mind. But, the more I hear about the assistive capabilities of these devices, the more skeptical I become. So, if writers want to use these tools to save time or eliminate drudgery, what does that say about the talent of those writers? Or, the substance of what they intend to write? Certainly, from the sound of it anyway, AI is good with the tedium of form which is part of a writer’s toolbox. And the sum of the dots equals a lot of hard work, not the least of which is command of form in order to woo a publisher and get a commitment. Fairness is not really in this game, admittedly. Nor should we then consider or expect AI to level the playing field, in favor of skillsets vs. ingenuity. Additionally, would we expect the marketer or holder of an AI device to have particular rights if or when a writer engages the assistance of the machine? Would the AI unit merit mention as a co-author of the work and would the human author be all right with that? Everything has a price, doesn’t it? Pitfalls, indeed. I earned a decent living through ability to speak, read and write with clarity and fluency. And, I had no robot coach to do the grunt work. Such adjuncts would have been laughed at them.

  3. I am not going to get involved in the *extended cognition* and *extended mind* theories, for although they are essentially correct that cognition and mind are in some sense extended beyond the scull, the theories are too steeped in the simplistic theory that the brain *is* a computer. (Here “computer” is under-defined and/or fictional, and of course language has gone on a very long holiday.)

    There is a major problem for LLM AI systems when their “word/language processing”(1) is so to speak “extended”, i.e., when their output is used as input training “data” (words in sentences) for the systems. At the moment ChatGPT-4 is only trained on pre-2021 data on the WWW (Bart is being trained on real-time data), but if and when it and other LLM systems get unrestricted WWW access for training purposes, they will start to process their own output which – as we know – is full of mistakes and uncertainties, or, to use the AI community’s absurd euphemism, they generate “hallucinations”. The more LLM systems’ output ends up on the net and is used by them as training data the greater the problem that this feedback loop will create by progressively increasing the *uncertainty* (i.e., entropy) on the WWW to the point where it could be so corrupted it becomes unusable.

    This is a well understood problem in information theory and cybernetics, and if the *uncertainties* can be identified they can usually be cleaned up. Stochastic systems like LLMs will always be subject to this problem and it is not possible to easily identify the *uncertainties* without introducing another system. At the present time, LLM systems rely on feedback/corrections by human users, Mechanical Turks, etc., to correct their *uncertainties*. But this method of error correction will quickly become overwhelmed if the feedback loop began to rapidly take off. Automated systems that could “fact check” the LLM systems will not work effectively because they are using the same data source as the LLMs and will create conflicts within the system about the veracity/correctness criteria of the input data and the systems’ outputs. (2)

    Since the 1970’s I have waged a losing battle against the hype and myths of AI and cannot therefore join with you in fancies about the AI. Nor indeed would I use ChatGPT to write anything that I was going to be put in the public domain. (3) So I cannot say I am altogether surprised that LLM systems have been overwhelmingly exclaimed as being near or as “intelligent” as humans at writing text, coding, answering exam questions, etc., given that these types of systems have been fooling people into believing they are intelligent for some sixty years or more. If simple systems with such limited access to the world could have high intelligence, or that they are likely to get their existing problems solved sometime soon, I would suggest that more philosophers need to ask themselves some of the original questions about machines. For example, Turing said (post his encounter with Wittgenstein), ‘[t]he original question, “Can machines think?” I believe to be too meaningless to deserve discussion.’ For all his brevity and faults, it is a pity that for the most part philosophers have not seriously considered ‘what is machine intelligence?’.

    I have given two links to articles and one to a paper on the LLM feedback problem. The first is an article that explains the LLM problem by likening it to a photocopier problem. The next is an article reporting on a paper about the problem and the last is a link to the paper. They are now numerous articles and papers on the issue but as usual they do not get the publicity because they are not part of the “sci-fi show”.

    (1) There is a distinction between word and language/sentence that I cannot deal with here.
    (2) There is a similar problem with Automated Vehicle systems. The AV systems are unable to work effectively with Automatic Emergency Braking systems because they are in conflict. Stochastic systems that detect fraudulent transactions for banks make many mistakes that appear to be preventable if another “rule/logic based” system made some simple checks. Again this creates a conflict which can dramatically reduce the systems’ performance.
    (3) I have been assessing chatbots/LLMs for many decades and find ChatGPT-4 and Bart perform reasonably well but certainly are not anywhere reliable enough to be used by academics or any *serious* communication.


  4. Well stated. For someone who does not get involved. Yes. And to 1966, ah, I remember it well.

  5. This article leads to the bain of the www as it stands. The authenticity and accuracy of material. Stating that AI will become useless as the quantity of quality information reduces in percentage of the whole leaving the www with little real value becomes no more holistically accurate than stating that humanity will go into decline as the number of individuals grows, or that too large a government inevitably leads to decline due to a lack of effective communication and governance.

    It also points towards the scenario, which from the arguments as presented appears to be required, that all material on the www should contain the author details and date of publication, allowing AI to solve the issue itself as it allows for assessing material quality published during any particular period/mindset/worldview using a form of categorisation, which is more digitally manageable. (I continue to deliberately avoid using character as a measure in these things because that leads into individualising material too much). This presented scenario would fit in with academic and scientific – traditional – modes of reasoning and retain a direct link with the foibles of humanity rather than allowing AI’s own constructions (good and bad) to most freely inform the learning process. But it would probably lead to a neutering of the advantages AI currently presents as well as reducing the motivation for improving the methods allowing AI to be able to freely and accurately identify for itself relevant quality material and learn from its mistakes.
    (This response has been written without reading the linked to material provided.)

      1. Ian

        “Watermarking” text, articles, papers, etc. would help to reduce the uncertainty/noise generated by LLM systems on the WWW, but again it is not easy and it would not stop LLMs from “hallucinating”. (See above paper) LLM systems cannot, as you put it, ‘solve the issue itself as it allows for assessing material quality published during any particular period/mindset/worldview using a form of categorisation, which is more digitally manageable.’ To digitally *manage* the data using categories or any other method outside their own very simple and rigid probabilistic/statistical method would require another system(s) which throws up numerous problems.

        We must remember that LLM/generative AI systems are by no means *new* and are simple probabilistic systems and as such will always perform as we expect them to within the limits of the *theory*. (The performance of systems like ChatGPT-4 is more a triumph of computing *power* than any new breakthrough in AI per se.) For sure, there is an age-old debate within mathematics and philosophy as to where these limits lie, but this debate is pretty much being dominated by the AI community in much the same way probability/statistics was dominated by eugenicists around a century ago.

  6. Paul D. Van Pelt

    Interesting insight on probability/statistics and eugenics. Not knowing more of the background on eugenics, that never occurred to me. Thank you.

  7. Using any sort of marking system becomes simpler when automated, but would significantly limit the breadth of any learning curve. The limits of LLM’s would normally, like anything else alter as AI more generally develops just as developments in LLM’s are likely to be incorporated within/come out of or influence, other AI systems. Library categorisation systems were prominent in my thoughts when responding earlier, and how many people live only within the parameters of those systems, finding new material there and not more broadly thinking outside the box. The perceived response is that because LLM’s will not think outside their own box including thinking about wider developments in debating LLM’s would be incorrect (It is possible that part of the response was merely intended to be descriptive). That is similar to what the eugenicists would have argued as a means of controlling the dialogue during that era. A sort of flavour of the month. This is not saying that it is not necessary to maintain some focus on a subject under discussion; More about remaining aware of the breadth of focus and how the available options become narrowed/directed by any accepted focus.
    We are agreed that many of the developments in AI arise from a result of increasing computing power, but that does strongly link back into debates about what intelligence is, does it arise out of a quick memory – i.e. speed of access to a form of categorisation – or is it driven by forms of thinking (reflective(AI?)/ out of the box/pattern matching(AI?)) and decision processes(AI?), which for humans is said can be assisted by a dream like state during the sleeping processes. Acknowledging different forms of thinking are suited to different areas.
    We are not necessarily agreed that the documented thoughts produced by humans using only LLM’s are fully indicative of the type of intelligence in the human element or the affect of the exercise of a reliant intelligence in the forthcoming lived environment. But the thoughts involved there seem to incorporate a partially Luddite element.

    1. I think you are jumping the gun when you start talking about “thinking”. Turing was correct that the notion that machines can think is too meaningless to deserve discussion. Unfortunately, the “discussion” is being dominated by AI researchers and Big Tech who have always and continue to obfuscate and confuse the discussion which is echoed and further distorted by the media, governments and numerous *groups*. I’m not suggesting that this discussion should be closed down, only that a discourse about machine intelligence and of course human/non-human animal intelligence should be addressed by a philosophical analysis that is not dominated by the AI lobbyists.

      As I have said above, stochastic systems like LLMs need to be analysed at the level of probability and statistics as well as their operational abilities. This may seem self-evidently true given AI researchers when they are *doing* their research have an understanding of them at this level. I should add that I’m not suggesting that we can acquire a definitive mathematical understanding of these systems at this level because probability/stats cannot by any means be exhaustively understood by mathematics and the phil of maths. We have to consider the *ordinary* use of probability which has a *commendatory* meaning which in the context of LLM systems has to be understood at the operational/appearance level.

      LLM systems are basically very simple and have been with us for longer than electronic digital computing. Again, as Joseph Weizenbaum would say, they “appear” to be intelligent, or, as Alan Turing would say, “appear” to think. Turing predicted that the “discussion” will become so distorted and confused by the end of the 20th century ‘that one will be able to speak of machines thinking without expecting to be contradicted,’ i.e., the discussion will be meaningless and unchallenged. I believe that the ‘scandal of philosophy’ in our time is that *too many* discussions about LLMs and AI are meaningless and unchallenged.

      (I’m aware I have been overusing ‘levels’ and rehearsed some points. I hope I have not confused the discourse with these passing and finishing remarks.)

  8. Mr. Tayler:
    I appreciated your remark on meaningless and unchallenged discussion. My brother reads this blog and has mentioned on an occasion or two *word salad*. I don’t know if he coined that, but if I have read him right and understood conversations he and I have had, word salad = meaningless and unchallenged—at least under some conditions. Your insights and experience are valued.

    1. Paul

      Yes – I use ‘word salad’ when describing AI discussions but feel I’m being a bit disparaging to salads.

  9. Looking to the way computing has grown, and the routes taken, it becomes apparent that the driving forces are control and cost/return, in that long perspective little else is allowed much traction. Indeed similar issues apply all the way back through Fordism to the cotton mills of the industrial revolution and beyond. The issues, including quantum computing, as they are being applied today do not create any perceptible or significant difference in that paradigm.
    So extending those same issues into the future, as they are being applied today, and the next level of control and cost/return would be the functions of government itself because very large sums could be saved by replacing the very structured legal codex with systems capable of reproducing that output, and vast improvements in individual representation (of the populace) would be enabled if each person had a direct input to a system capable of answering them and adding their concerns to the whole, in a way which benefited the whole. No government could become larger than full representation facilitated by fully comprehensive communications (ignore the potential individual time commitment involved here). Clearly a technological system of that type should be incorruptible, unless the algorithms themselves were corrupted. Such a system would also be more representative and cheaper with basic systems at the simplest levels capable of feeding more advanced computing(the debatable AI type mechanism) at the centre.
    That type of advancement is unlikely to occur in the too near future because of the resistance with which the advancements at the current level of society and above is/are/will experience. And yet the arguments being deployed there are so very similar to those deployed by each successive social group affected by evolutionary technological advancement.
    It does seem there will be good reasons for mathematics to continue to struggle with understanding probabilities, as that would seem likely to be the area where any automated government would be manipulated, merely because arguments may be mathematically advanced for particularly favoured options (favourite flavours) by people skilled in that area. Answering questions of ethical behaviour in that type of environment will require a deeper knowledge of mathematics, but the feelings associated with a more general human morality would seem likely to remain the same for some time.

Comments are closed.