A recent report by Lipsitch and Galvani warns that some virus experiments risk unleashing global pandemic. In particular, there are the controversial “gain of function” experiments seeking to test how likely bird flu is to go from a form that cannot be transmitted between humans to a form that can – by trying to create such a form. But one can also consider geoengineering experiments: while current experiments are very small-scale and might at most have local effects, any serious attempt to test climate engineering will have to influence the climate measurably, worldwide. When is it acceptable to do research that threatens to cause the disaster it seeks to limit?
The report notes that although the virus experiments occur in biosafety level 3 or 3+ facilities, accidents happen at a worrying rate: roughly 2 per 1,000 laboratory-years. This means that a research program running for a decade at 10 laboratories has nearly 20% risk of resulting in at least one laboratory-acquired infection. The chance of this event leading to an outbreak has been estimated to at least 10% for influenza.
The number of fatalities of an outbreak would be a skew-distributed random number: most likely small, but with a heavy tail potentially running into tens of million dead. Even if the outbreak just corresponds to the normal flu mortality (around 2 per million) that means 280 people globally would die in expectation due to the research program. However, the research would also – in expectation – reduce flu mortality by some fraction. If that fraction is larger than 2% then the net effect would be an overall improvement despite the risk. This seems fine by the Nuremberg Code: “the degree of risk to be taken should never exceed that determined by the humanitarian importance of the problem to be solved by the experiment.”
Except that this is not how we normally reason about risky research. First, there is the long tail problem: if there is a big but unlikely outbreak, then that could easily swamp the number of saved lives in normal years. We might not care about the expected degree of normal risk, but the expected degree of disastrous events. Second, the intentional risking of other’s lives – especially innocents with no say in the experiment – might be problematic from a moral perspective. Third, there is somebody to blame and a policy that can be accepted or not.
The first problem becomes especially acute for existential risks threatening the survival of humanity. However, I think most of us have an intuition that a big disaster is worse than the same number of victims spread out in time and space. The reason is that a correlated disaster has numerous other bad effects: societies are stretched to the breaking points, institutions may crumble, tragedies compound each other. If we accept this, then reducing the probability of extreme tail risks becomes more important than reducing median number of victims. Research or policies that trade huge disasters for more numerous but small tragedies might be the right thing to do.
The second problem is one of knowingly risking lives of innocent people. This may be unavoidable: introducing a new medication could plausibly harm some users that would otherwise have been fine, even if the medication on its own works well (consider giving aspirin to middle-aged people to reduce cardiovascular disease). In the medication case this is somewhat unproblematic because the aim is the benefit of the users: things are worse if the potential harm is imposed on an external group. However, for research with global applicability like reducing pandemic risk or helping the climate, there might not be an external group. Even people who do not know they are benefited by improved medicine or a managed climate would benefit from it. So research that is likely to help everybody in the large eventually might avoid the second problem.
There is still a stakeholder problem: if somebody does something affecting me, do I not have a say about it, even if it is halfway around the world? How much can be left to the researchers or local agencies?
Fouchier and Kawaoka criticized the report, claiming “their work had full ethical, safety and security approval, with the risks and benefits taken into account”. This might be true, but other researchers question whether these approvals are correct: there are subtle issues here about whether the flu community is overly optimistic about their research potential and safety record, or whether other parts of the oversight system and the research critics actually understand the problem right – and how to accurately tell who has the right kind of expertise.
However, the flu research appears to happen within a framework aiming for accountability. It is not perfect, but it can be inspected and if causing trouble controlled or held accountable. The wildcat iron seeding experiment in the Pacific was not done within any framework and seems hard to hold clearly accountable. At the very least, experiments with large-scale effects need to have proportional accountability.
Summing up, it seems that risky experiments can be justified if they look likely to reduce overall risk (especially extreme tail risk), their benefits would accrue to all who are also subjected to risk, and the experiments can be adequately monitored and kept proportionally accountable.
It is interesting to consider these factors for other activities, such as global surveillance. It might be that breeding new pathogens is more ethically justified than NSA espionage.
Thanks for this. Very interesting and very helpful.
A problem with the ‘gain of function’ experiments is that they are not seeking to cure anything, but effectively to create new diseases. In the case of an accidental outbreak resulting from a gain of function experiment, research would have to be done to find a cure for this new disease, and this new research itself would entail further risk. We could plausibly (?) end up in a positive feedback cycle.
Good point. I oversimplified things by simply assuming the gain of function experiments would somehow directly cause some cure. They may still lead to useful risk reductions, of course (and if this chops off enough tail risk they might be worth doing even given the risk of accidental releases).
Generally, there will always be research going on to handle the current flu viruses, so in a sense the feedback is already occurring. It is just that compared to the natural breeding of new strains lab research is microscopic: it gets scary when it is directed towards something we think nature may be bad at evolving.
A crude mathematical model: the research reduces the basic risk by a factor F<1, but carries a probability R<<1 of causing another instance of the basic risk. So the risk changes from P to FP + R. If, in the case of a release, we do secondary research that handles the R problem and it is roughly as effective as the first, then the total risk becomes FP+R(F + R), and so on: (P + R+R^2+R^3+…)F. So the total risk is roughly a geometric series with constant R: typically we do not have to worry about higher-order risk. But the real rub, I think, is if the risk the research poses is of an entirely different kind from the original risk. Then there is no guarantee that the damage is done in a way that can be dealt with using research or that it is within the same magnitude.
Thank you for highlighting our work on your blog. I agree that you “oversimplified things by simply assuming the gain of function experiments would somehow directly cause some cure.” And I think so in two ways. First, for reasons we consider in more detail in the original article, the chances of “causing some cure” are extremely remote from this kind of research, as vaccine developers have highlighted, and also because of the problem of epistasis in flu, which means that finding mutations associated with some trait (transimssion or one of its biological constituents) in one genetic background might be misleading, and certainly is not reliably predictive, in another genetic background. Flu is an incredibly diverse virus. All this is described in Table 1 and the text. Second, and more generally, the choice is not to do GoF / PPP research vs. not do it, but rather how to allocate a finite investment in the public health goal of preventing/mitigating catastrophic flu. From a pragmatic perspective, for several reasons we highlight in the article including cost, throughput (how many different flu strains can be tested), generalizability, statistical rigor (also stemming from cost/throughput), and mechanistic detail, alternative allocations of the research portfolio are more likely to lead to “cures” (really, preventive measures) than PPP experiments. This has ethical implications too. The point from the Nuremberg Code you quote is not the only relevant one. Point 2 states: The experiment should be such as to yield fruitful results for the good of society, unprocurable by other methods or means of study. While there is legitimate debate about whether the narrow scientific problem of transmissibility can be fully solved by other types of experiments (and I would debate whether these experiments can solve it either), the existence of alternative approaches to get the same “good of society,” i.e. flu prevention and mitigation, would seem to be an ethical problem when the GoF/PPP experiments are unique in the dangers they pose, but not in the benefits they provide.
This suggests an addendum to my list of reasons when a risky experiment is ethical: the risk reduction cannot be achieved in a safer way.
If there is an alternative that is safer, then that one should be used. And indeed, the whole point of your paper is to argue this – I feel a bit embarrassed not to have pointed it out in my post 🙂
Of course, there could be tricky risk-benefit cases where there is a cheap, riskier experiment and a more expensive, safer experiment. This is not too different from other risk-benefit situations where we may have to choose between saving lives at different costs. The interesting aspect here is that there is far more tail risk than in (say) traffic safety considerations. Building the cheaper but less safe highway option and spending the rest of the budget on other effective traffic safety measures would in median lead to a better outcome than if we had spent all the money on a safer highway, but if we are unlucky might lead to more lost lives due to a single really bad accident. But even bad traffic accidents are fairly limited and local disasters. However, in the case of heavy tailed risk this would be many orders of magnitude worse than the median accident rate, and this would likely make choosing the riskier option far more problematic. So it seems that cost-effectiveness might become a less effective heuristic when we compare big heavy tailed risks to non-heavy tailed risks.
Fascinating & Important. It seems to me, that both the paper, and the blog (tend to – I accept not in all instances) assume that the risks of influenza virus transmissibility relate only to this virus and are targetable as such. On this basis, Lipstich makes a compelling case that no ‘production’ is necessary, as the same benefits can more more cost-effectively and less riskily obtained by other methods.
To change the balance of the decision, as Anders states, the length of tail of the risk might be extended, but conversely, the depth of benefits might also be changed to reveal multiple or universal pathways. Secondly, the bulk of risk is assumed here to attach to the direct (presumably unintended) causation of new outbreaks, rather than direct (shared or unshared) knowledge of pathways that might enable more outbreaks. There seem to me to be at least two elephants in the room – malign intent & generic/universal research plus their related intersection of repurposeable knowledge. Are the Nuremburg code and social utility sufficient precautionary principles for future GoF debates accepting these two possibilities?
Malintent has two components: malintent among people working with the risky virus and malintent among other people, dual using the gained knowledge to do something harmful.
The first component can be ‘folded in’ with accidental release, producing an increased release risk. It is not clear to me how much of past releases can be ascribed to different levels of malintent; maybe the current risk estimates actually implicitly contains it. However, it seems typical for organisations to underestimate just how much risk there is from insiders: case studies in nuclear safety and computer security are not reassuring.
The dual use component is trickier to estimate, but might occur far away in time and space from the initial research (the enhanced mousepox results will be with us till the end of time). However, not all risky research lends itself to malicious dual use. Influenza is a poor choice for terrorist weapon (not lethal enough, hard to control). The worry in the gain of function experiments might actually be more that the methods could be applied elsewhere – but now the risk lies in teaching certain lab techniques, not the research itself.
I think these components can be folded into a Nuremberg consideration with some effort. It is just that the second requires dealing with some pretty profound uncertainties about future technologies, so the decision will be made in a not very assuring air or uncertainty.
Here considerations like “the unilateralist’s curse” might apply: we have reasons to be far more conservative than we would individually think it is rational to be, since it is enough that one agent unwisely moves ahead for a risk to materialize.
Thanks Anders, you read my question more clearly than I expressed it…and with justified ambivalence over GoF as a metaprocess. Invoking the unilaterlists’s curse also points to a further elephant, the realpolitiks of geopolitics, affordability, intellectual property and commercial competition all actually have much more bearing here than we may admit…whereby we could certainly call in game theory, health economics and national sovereignty…on all of which ethics/philosophy has a take, I guess…but would drag us away from your point.
Yes, there are a lot more considerations than just pure harm going into real decisions about risky experiments. But these are complicating factors that it is hard to say anything general about: I think the interesting thing here is figuring out the ethical boundaries due to the experiment being risky in itself. Then we can adjust them depending on other considerations.
Conversely, a good theory of the ethics of big risks might influence the considerations. For example, national sovereignty seems to be a weak consideration when dealing with threats that are naturally transnational.
The trouble with this system is that I can’t press ‘like’. That’s fascinating!
Comments are closed.