Asking the right questions: big data and civil rights
Alastair Croll has written a thought-provoking article, Big data is our generation’s civil rights issue, and we don’t know it. His basic argument is that the new economics of collecting and analyzing data has led to a change in how it is used. Once it was expensive to collect, so only data needed to answer particular questions was collected. Today it is cheap to collect, so it can be collected first and then analyzed – “we collect first and ask questions later”. This means that the questions asked can be very different from the questions the data seem to be about, and in many cases they can be problematic. Race, sexual orientation, health or political views – important for civil rights – can be inferred from apparently innocuous information provided for other purposes – names, soundtracks, word usage, purchases, and search queries.
The problem as he notes is that in order to handle this new situation is that we need to tie link what the data is with how it can be used. And this cannot be done just technologically, but requires societal norms and regulations. What kinds of ethics do we need to safeguard civil rights in a world of big data?
…governments need to balance reliance on data with checks and balances about how this reliance erodes privacy and creates civil and moral issues we haven’t thought through. It’s something that most of the electorate isn’t thinking about, and yet it affects every purchase they make.
This should be fun.
Note that this is not just the regular privacy debate: this is about what kind of information is allowed to be inferred about us and how different agents are allowed to act on it.
Can we do anything about it?
Of course, one might be skeptical about whether governments (or societies in general) can proactively do this. Maybe, as one historian argues, the causal power is on the side of the technology: just as the printing press changed how society functioned, big data and analytics will do that too. Their presence will change our concepts of privacy, moral and civil rights regardless.
I believe this is true to some extent: we often change our attitudes to things as we integrate them in our lives (just consider the rapid softening of views on IVF as soon as the babies started arriving, not to mention the changing privacy mores in the Facebook era). But the rights and norms we define are not solely dependent on the technology, they interact with it: our views of freedom of expression are coevolving with our media. Some technologies will be slowed or speeded up by how they fit with societal views.
Is this a civil rights issue?
Civil rights deal with ensuring free and equal citizenship in a liberal democratic state. This includes being able to adequately participate in public discussion and decisionmaking, making autonomous choices about how one’s life goes, and avoiding being discriminated against.
That big data analysis infers information about people does not itself affect civil rights: it is at most a privacy issue. It does not affect the moral independence of people. The real issue is how other agents act on this information: we likely do not mind that a computer somewhere knows our innermost secrets if we think it will never act or judge us. But if a person (or institution) can react to this information, then we might already experience chilling effects on freedom of thought or speech. And the act itself may be discriminatory in a wrongful way.
Discrimination is however a complex issue. Exactly what constitutes wrongful discrimination is shaped by complex social codes, sometimes wildly inconsistent. Just consider how churches are able to get away with discriminating against non-believer or wrong-sex applicants (and possibly even sexual preferences) in a way that would be completely impossible for private companies by claiming that these traits actually are highly relevant (and hence discrimination not wrongful) by their views. Groups like sexual minorities and the disabled have gained protection from discrimination following vigorous debate and cultural change. If it is OK to select partners in person based on racial characteristics, should commercial online dating services provide such criteria or are they abetting racism? And so on. Just as it is not possible to decide beforehand what questions might produce discriminatory answers, it might not be easy to tell what behavior is discriminatory before it had been discussed publicly.
Big data analysis might help various forms of discrimination, but also expose it. No doubt more advocacy groups are going to be mining the activities of companies and states to show the biases inherent in the system.
One regulatory challenge with big data and big analytics is that, unlike what the nicknames suggest they can be done on a small scale or in a distributed manner: while there are huge amounts of data out there, collection and analysis is not necessarily located at a few easily regulated major players. While Facebook, Google, Acxiom and the NSA might be orders of magnitude more powerful than small businesses or hobby projects, such projects can still harness enough data and ask problematic questions – especially since they can often piggyback on the infrastructure built by the giants.
A second challenge is that analyzing questions can be done silently and secretly. It can be nearly impossible to tell that an agent has inferred sensitive information and uses it. Sometimes active measures are taken to keep analyzed people in the dark but in many cases the response to the questions can be invisible – nobody notices offers they do not get. And if these absent opportunities start following certain social patterns (for example not offering them to certain races, genders or sexual preferences) they can have a deep civil rights effect – just consider the case of education.
This opacity can occur inside the analyzing organization too. For example, training a machine learning algorithm to estimate the desirability of a loan applicant from available data might produce a system that “knows” the race of applicants and uses it to estimate their suitability (something that would be discriminatory if a human did it). The programmers did not tell it to do this and it might not even be transparent from the outside what is going on (conversely, getting an algorithm to not take race into account in order to follow legal restrictions might also be hard to implement: the algorithm will follow the data, not how we want it to “think”).
A third challenge is that the growth of this infrastructure is not just supported by business interests and government snoops, but by most consumers. We want personalization, even though that means we enter our preferences into various systems. We want ease of use and self-documentation, even though that means we carry smart devices and software that monitor us and our habits. We want self-expression, even though that places our self in the world of data.
The fourth challenge is that what questions are problematic is ill defined. It is not implausible that there exist groups that might be discriminated against on the basis of data mining that are not known as socially salient groups, or that apparently innocuous questions turn out to reveal sensitive information when investigated. This cannot be predicted beforehand.
These challenges suggest that public regulation will not be able to effectively enforce formal rules. Transgressions can occur silently, anywhere and in ways not covered by the rules.
Where does this leave us?
Croll suggests that we should link our data with how it can be used. While technological solutions to this might sometimes be possible, and some standards like creative commons licenses are being spread, he thinks – and I agree – that the real monitoring and enforcement will be social, legal and ethical.
If people could prescribe narrowly what their information could be used for, many useful or important applications would be lost (tracking traffic jams using cellphone movement, epidemiology using online data trails etc.) Would we accept that people could prescribe that only people from certain political, religious or sexual groups would have access to some of their data? Strong rights to our data means that many social uses can be prevented, sometimes limiting the civil rights of others (for example, consider how this could hinder investigation of claims of discrimination). Just as it is not possible to beforehand tell what data contains information that might become problematic or what questions might give discriminatory answers, it is hard to a priori delineate usages that are entirely OK. Strong data rights puts the onus of defining appropriate uses on the data owners, but much of the cost of navigating their wishes on data-users, which will especially hurt the smaller users (private people, NGOs etc. ) compared to the big ones (Google, NSA et al.)
Weak rights to our data means that the effort must instead go into monitoring appropriate usage. Now the cost is on the data owner’s side (or with institutions like governments that try to safeguard their rights), and the data users are reaping benefits. This is largely our current situation.
Are there other ways? One approach would be reciprocity: demanding information back about how data is gathered and processed, allowing proper responses. If the government wants to monitor us, we should demand equal or greater oversight of government. If companies collect information about us we should be allowed to know what they have and what they use it for. Except that governments and companies are quite recalcitrant in doing this. At least companies might plausibly claim they would lose competitive edge if they told about what they were doing.
Strong transparency might allow monitoring of what everybody is up to (at least in principle) regardless of their wishes, and hence allow us to retroactively punish civil-rights infringing uses, hopefully discouraging future misbehavior. This might be the direction we are heading if we do not come up with some way of efficiently controlling the rapid growth of data-gathering capability of everyone. In the one-way mirror scenario this means some concentrations of power can monitor everybody else: this is fine if they are trustworthy, but I suspect most find it worrisome. The alternative two-way transparency has everybody potentially monitoring everybody else. In either case aspects of civil rights will likely change in such a post-privacy society.
I doubt there are any neat principles or solutions to civil rights in a big data analytics world. Civil rights are changeable and will adapt together with technology – sometimes as a response to it, sometimes driving it, sometimes out of sheer rebellion. Big data is so useful that it will evolve in a myriad ways, requiring frequent updating of our norms and their application. The policy challenges are profound and unlikely to go away by any foreseeable development. But the good news is that we do not need perfect answers to do something useful. Mixtures of approaches can be tried. We might even be able to use big data analytics to study what policies actually work. Our goal – strong protection of civil rights – is in many ways ill-defined and subjective, but like many sociological and ethical issues it can be studied empirically.