Skip to content

Using AI to Predict Criminal Offending: What Makes it ‘Accurate’, and What Makes it ‘Ethical’.

Jonathan Pugh

Tom Douglas


The Durham Police force plans to use an artificial intelligence system to inform decisions about whether or not to keep a suspect in custody.

Developed using data collected by the force, The Harm Assessment Risk Tool (HART) has already undergone a 2 year trial period to monitor the accuracy of the tool. Over the trial period, predictions of low risk were accurate 98% of the time, whilst predictions of high risk were accurate 88% of the time, according to media reports. Whilst HART has not so far been used to inform custody sergeants’ decisions during this trial period, the police force now plans to take the system live.

Given the high stakes involved in the criminal justice system, and the way in which artificial intelligence is beginning to surpass human decision-making capabilities in a wide array of contexts, it is unsurprising that criminal justice authorities have sought to harness AI. However, the use of algorithmic decision-making in this context also raises ethical issues. In particular, some have been concerned about the potentially discriminatory nature of the algorithms employed by criminal justice authorities.

These issues are not new. In the past, offender risk assessment often relied heavily on psychiatrists’ judgements. However, partly due to concerns about inconsistency and poor accuracy, criminal justice authorities now already use algorithmic risk assessment tools. Based on studies of past offenders, these tools use forensic history, mental health diagnoses, demographic variables and other factors to produce a statistical assessment of re-offending risk.

Beyond concerns about discrimination, algorithmic risk assessment tools raise a wide range of ethical questions, as we have discussed with colleagues in the linked paper. Here we address one that it is particularly apposite with respect to HART: how should we balance the conflicting moral values at stake in deciding the kind of accuracy we want such tools to prioritise?


Two Kinds of Predictive Failure: False Negatives and False Positives

One dimension of accuracy in risk assessment is the rate of false positives: the proportion of individuals deemed to be high risk who will not go on to offend. A second dimension is the rate of false negatives: the proportion of individuals deemed low risk who will offend.

A recent review of existing risk assessment tools suggests that they typically have a very high false positive rate, over 50%, and a lower, but still substantial, false negative rate of around 9%.

In a perfect world, one would hope to develop a tool that yields no false positives and no false negatives, regardless of the population in which it is implemented. However, in practice, reductions in the rate of false negatives tend to come at the cost of more false positives, and visa versa.

Which dimension of accuracy should we prioritise? The answer to this question will depend on the weight we attribute to different moral values in criminal justice. Ethical custody decisions have to strike a delicate balance between sufficiently protecting the public from re-offending, whilst also minimizing the infringement of the rights of those whose freedom we are restricting.

Notably the HART system is designed to be more likely to err on the side of caution, and to classify an individual as high or medium risk; this strategy lowers the likelihood of false negatives, conferring greater public protection. This is perhaps likely to meet with public approval in this highly emotive area; however, this strategy is thus also more likely to lead to more false positives, and thus greater infringements of the rights of individual detainees.


Different Weights in Different Contexts. 

Crucially, the optimal balance between false positives and false negative will depend on the social and political context in which the assessment tool is used. It depends, for example, on the detention practices of the jurisdiction in which the assessment is being carried out: the more harmful or restrictive the detention is likely to be to a detainee, the more important it becomes to avoid false positive assessments of risk.

To illustrate, it could be argued that in countries with inhumane detention practices, reductions in false positives should be prioritised over reductions in false negatives. Conversely, if the assessment is carried out in a high-crime area with humane detention practices, there is a case for prioritising the avoidance of false negatives.

The ethical costs of false positives are not simply a function of the nature of the detention itself. Another crucial factor is the stage of the criminal justice process at which the tool is deployed. For instance, risk assessment tools are sometimes used to inform parole decisions. If we assume that the offender in question is currently serving a proportionate sentence, then the ethical cost of a false positive, and thus ‘unnecessary’ refusal of parole, may be relatively small; the prisoner will simply have to serve the remainder of a sentence that is proportionate to the gravity of their crime.

However, if an offender has already served a proportionate sentence, or if, as in the case of suspects and defendants, they have not yet been convicted of a criminal offence, the harm of a false positive assessment becomes more egregious; detention in such a case will amount to an unjust restriction of liberty.

AI systems such as HART may increasingly be used to potentially enhance the accuracy of risk assessments in criminal justice. However, in our pursuit of greater accuracy in this context, we should put ethical considerations about the values we are prioritising in emphasizing different dimensions of accuracy at the forefront of our decision-making about how these systems should be developed and deployed.

Share on

5 Comment on this post

  1. This use of statistics and probability to identify and prevent criminal behaviour was one of the major driving forces for the development of statistics and probability in the 19th century. Repackaging and hyping it as being ‘AI’ does make it anymore ‘intelligent’ in any meaningful sense. ‘Machine Learning’ systems, which would be a better description of HART, have applications in assisting human decision-making. I am quite prepared, as Adam Smith (nearly) did, to identify a length of string, or indeed, as many AI researchers do, a thermostat as being ‘intelligent’ (I do not accept, as some AI researchers claim, that they are ‘conscious’. However, we must understand this use of the word ‘intelligent’ and not use the term ‘AI’ to describe systems like HART because the police, suspects, magistrates, media and the public may become confused about the machines software limitations and capabilities. For sure, as your post outlines, these systems do raise many ethical issues, but, as with the development and use of statistics and probability in the 19th century, we must also address, in all its ramifications, the ‘philosophical’ issues of Machine Learning and Intelligence. Too much concentration on the ethics could help to feed the new wave of AI hype.

  2. Who do you want to gift for cheap wholesale nfl jerseys from china?
    Get value into the page: nike canada hockey jersey

  3. Step 1: Create machine learning system for justice department
    Step 2: Feed it with data from past and present cases
    Step 3: Ignore the fact that the past and present cases are influenced by factors such as some races having higher arrest rates and sentences for the same crimes, changes in prison conditions that could affect chances of reoffending, changes in the laws like legalizing marijuana, possession of which would have been a major contributor to rearrests
    Step 4: Treat people who would have been rearrested for having marijuana like they are destined to reoffend even though the ‘crime’ you predicted they would be arrested for is now legal.
    Step 5: When everything goes tits up, pretend the IT people didn’t tell you a thousand times that machine learning isn’t a magic bullet, it amplifies biases and mistakes within it’s source data and should never be taken as gospel, just like quants told everyone not to run their housing markets on algorithms that could only make use of a few decades worth of past data to predict housing markets whose cycles can be 70 years long.

    All this AI talk is, is separating the unethical decision from its source (all the cases fed into the machine learning system) and allowing any nasty outcomes to be blamed on the computer or maybe the IT guy.

  4. cotton jersey knit fabric wholesale.Sports Jersey Wholesale Reviews-Welcome to
    buy 96 Muhammad Wilkerson Jersey from China with free shipping and best service in our jerseys online shop.

Comments are closed.