Judgebot.exe Has Encountered a Problem

Written by Stephen Rainey

Artificial intelligence (AI) is anticipated by many as having the potential to revolutionise traditional fields of knowledge and expertise. In some quarters, this has led to fears about the future of work, with machines muscling in on otherwise human work. Elon Musk is rattling cages again in this context with his imaginary ‘Teslabot’. Reports on the future of work have included these replacement fears for administrative jobs, service and care roles, manufacturing, medical imaging, and the law.

In the context of legal decision-making, a job well done includes reference to prior cases as well as statute. This is, in part, to ensure continuity and consistency in legal decision-making. The more that relevant cases can be drawn upon in any instance of legal decision-making, the better the possibility of good decision-making. But given the volume of legal documentation and the passage of time, there may be too much for legal practitioners to fully comprehend.

AI applications in general have the capability of assimilating huge amounts of data in various domains. A great deal of research has focussed on this kind of feature, especially in terms of ‘Big Data’. An AI application developed for the domain of law, and legal decision-making in particular, could serve as an extremely useful tool for legal practitioners. Some such applications already exist. There are applications that serve to automate aggregation of relevant cases through statistical methods, or to predict case outcomes, or to find otherwise obscure relationships between legal documents. Some legal applications are already seen to produce problems, not least owing to the replication of bias that can result. For instance, where longer sentences are recommended for black prisoners in the US by algorithms trained on data from already unfair legal systems, and thereby perpetuating that unfairness.

What would represent a truly groundbreaking AI application for law? One that could copy the higher-level cognitive skills apparent in assimilating statute, precedent, and legal reasoning in order to come to an explicable verdict on some specific matter – an artificial judge. This would require an AI that could reproduce patterns of legal-moral reasoning, based in prior legal decision-making. In this scenario, an AI could be used to advise in novel cases and explain its reasons for the advice given. This would require some means of simulating understanding of the law’s operations. This would be no mean feat, though artificial neural networks might appear to offer the most promising way through.

Artificial Neural Networks and Patterns

Artificial neural networks (ANN) are essentially sets of algorithms that operate on datasets to discern patterns in those sets. All ANN can do two main things: learn from examples and generalise. This makes them extremely useful for pattern-recognition in particular. Among the more recent developments in language-based AI is the GPT-3 (Generative Pre-trained Transformer). This is essentially a language model which uses deep networks to produce text following a prompt which – in a specific statistical sense – is like that prompt. GPT-3 has an immense database to draw upon, making its model highly effective across a variety of uses, like in making summaries of longer texts, or in translating pieces of text between languages.

To be clear, ‘effective’ in this context means ‘strikingly convincing in some respects’, rather than anything that might trouble the Turing Test. The interesting results wrought by the model can appear eerily on point, apparently written with intelligence, intention, and style. But the surface is all there is. As Floridi and Chiriatti put it,

“GPT-3 writes a text continuing the sequence of our words (the prompt), without any understanding. And it keeps doing so, for the length of the text specified, no matter whether the task in itself is easy or difficult, reasonable or unreasonable, meaningful or meaningless”

With the GPT-3, the output text may appear cogent, but will bear little scrutiny where probing or important areas are approached. With law, matters require more than a surface similarity with their antecedents, and more than a statistical correlation among productions. Law has standards relating to the types of reasoning from which it is constituted, as well as the material on which that reasoning draws. A legal decision requires a justificatory story that reflects particular normative structures. It isn’t enough that the decision looks right on the page, sounds legal, or is above a statistical threshold of similarity with a range of legal texts. Legal decision must come from the right kinds of sources, via the right sorts of reasoning, and thereby come to a robustly defensible conclusion. This appears to be a more complicated task than emulating a general style of output given a set of input.

Simply ‘looking’ or ‘sounding’ legal isn’t enough for legal decision-making. So how can the required nature of legal reasoning be recovered from documented legal decision-making? Setting an ANN to work upon a corpus of prior decisions would likely yield something that matched patterns of word use in legal reasoning, and thereby produce something like the modelling GPT-3 works on – a kind of language model of legalese. This would probably be impressive in one sense – that it could do that work – but ultimately this would risk practical emptiness. In merely reproducing patterns, it presents the form only, without content. Legal AI, like moral, or ethical AI more generally, would require more in terms of content. The content is what matters, in such reasoning, not that it matches a pattern.

We can take a broader lesson from this train of thought. We ought to be much more careful about how we evaluate AI applications in general. It’s easy to get swept up by the mysterious ways in which they appear to work. Where there are problems, we might be misled into thinking more, or better, data will solve everything. But in any case of an AI producing something that looks like a judgement, understanding will be missing. Where people are involved, judgements can be non-linear, justifiable by evaluations according to a variety of values, norms, and experience. Reasons rather than data are what count, and AI is data-driven. While varieties of AI applications may be useful for streamlining tasks in a variety of contexts, where judgement is at stake human reasoning can’t easily be replaced.

Share on

Ian on On plans to extend use of chemical castration for sex offenders in EnglandJune 12, 2025
Raising the spectre of these types of legal penalty to unwanted acts against another must bring to mind actual cases…
Jesse Gray on On plans to extend use of chemical castration for sex offenders in EnglandJune 7, 2025
Hi Lisa, Thank you for the insightful piece. I found the early remark—that we can only know an intervention’s success…
Pavel Novak on Profiting from Misery: Is There Something Different About Healthcare Data?May 23, 2025
The Medical secret is one of the most common and one of the oldest obligation relating to health care profession.…
Manish Kumar on Dire Wolves and Deep Prompts: Language Models in Applied EthicsMay 8, 2025
This fascinating case of dire wolf proxy creation by Colossal Biosciences brings fresh relevance to the ethics of de-extinction. It’s…
Ian on The Duty to Have Courage: Developing the Theory of Epistemic InjusticeMay 3, 2025
No, I am not saying the interpreter is in charge of testimony, what I mean is different, in that interpreter(s)…

Judgebot.exe Has Encountered a Problem and Can No Longer Serve