Code Cradle: Distributed Privacy : A Distraction

I was reading the recent Register article on Privacy legislation and it struck an interesting note with me. It assumes that data collection is inevitable, and the only way to police it is through "do not collect" or "do not follow" tags.

That's problematic for a whole heap of reasons, some of which the article author outlines (How do you know not to store someones details in the future without storing their details), but it all stems from the same point. Data collection is inevitable. Data ends up in a multitude of tables across the internet, and it is impossible to know who knows what about you, especially as people sell that information to others.

Its a familiar problem to the UK government. Concerns about the NHS databases, ContactPoint, Offender Registers, etc have gotten much more traction in the UK media than traditional "Big DB" companies like Experia who store / sell your credit report and is pretty much vital to anyone needing credit.

The answer to this is allowing people to store their own information, or in the case of dependents have it managed for them. Each person has a dedicated store of their information, demographic, medical, financial, friends, contacts, etc. Each person can choose how much information is given out and to whom and on what basis. The responsibility for the data becomes that of the person, not of the data collectors. Services then pull this information together however they need to show it.

Take Facebook friends lists as an example.

I have a list of IDs who I am friends with, they have list of people they are friends with, and so on, making a large graph of a chunk of the planet. With the current system, Facebook has all this information, pulls back the list of my friends, maps those IDs to some people and shows them to you, whether my friends want this or not.

Instead, lets take a distributed privacy approach to this instead. Facebook now requests my list of friends. Immediately I've got control. I've got 400 friends, however I don't want anyone to currently know about 20 of them, my store therefore only returns the 380 friends I'm allowing people to see.

These are only IDs as well, there's no personally identifiable information in there. Facebook now needs to request the relevant information from each of the IDs to show my friends list. However out of the 380 people that I'm showing as friends, 80 of them don't want to share their details right now. Facebook then shows my 300 friends that it knows about (and might say, and 80 people that we don't know about).

That's my privacy respected, and your privacy respected right?

That works from the FB perspective, now lets take a look at something like the massive ContactPoint database which was to store lots of information about the children of Britain.

There is plenty of sensitive data involved with children, and its always an evocative subject. The key worries were over the what data was stored, how to change that information, and who had access to it (The last one being the big media "worry"). The list of people who had access to the db went from a massive amount of people pre-outcry to such a small amount of people post-outcry that it was not worth running the DB in the first place.

Lets concentrate on the last part. Who has access to it. How can we allow GPs, social services, the police, clinical psychologists all access your child's record without allowing your next door neighbor who works at the local health center access?

By the data being distributed and owned by the person, we can also introduce the idea of "Sensitive access". Any data deemed to be sensitive requires the person to authorize its release from their data store. This data could be different things to different people. Demographic information for a person running from an abusive spouse for example, or the medical record of a child.

It would work pretty simply. The digital signature of the person making the request is sent along with the request. If the data being asked for is flagged as being sensitive, a confirmation for its release is sent to the person. If its the GP requesting access you give it; the next door neighbor is denied (and potentially reported).

There are more benefits here, I can see all data that's being held on me, as I'm the one that's holding it. If its wrong, I can change or delete it (see below). I'm ultimately the one responsible for my data. No more data breaches, no more records being left on the train / bus stop, its my data.

Changing and deleting records are a fun thing. Obviously, I want to be able to change my clinical record if its wrong (No, I'm allergic to penicillin not pencils!), but how about negative features like criminal record? Obviously, if we want to keep our idea pure, things like your criminal record should be part of the whole data set that you own. However should you be allowed to delete your criminal record, and act like it never happened? Same goes with pretty much anything required by the government; tax records, National Insurance number, Birth Certificate, etc.

This is the only place where the idea gets awkward. Its also the place where a lot of the privacy ("I don't want the government poking into my affairs") purists will disagree with me. This data should be shared between yourself and the government. It can't be changed *by either party* without both sides consenting. That's quite an important distinction. It means that I can't change my negative elements (say, wiping out the points on my driving license) without the shared source agreeing to it. Likewise, they could not add to it (adding points onto my license) without my agreement, which may have to be court ordered if I really don't agree.

There are a few safeguards that would have to be put in place. Guardianship transfer would be important. Taking control of an account away from an abusive parent / guardian would be vital to maintain the security of the individual. There would also have to be a security related aspect to this, as having the Intelligence services writing their security reports onto the record of an individual suspected of terrorism would be hilarious.

The last part of this is enforcement. It would have to be illegal for people to track you anymore, or this becomes a moot point, and the big players continue invading everyone's privacy as usual. Big databases full of your information owned by other people would have to be made illegal. There would need to be processes for working out disputes of shared information. There would still be the argument about what data is owned by who ( Are the headers of my Email messages owned by me? How about the data ISPs use for traffic shaping?).

Lastly, this is such a sea change, that would effect so many big established players, that would take a massive effort on the part of both governments and their people, a huge amount of storage and servers to host all the information (and who owns / runs the servers your data is on? Do you trust them?), that this is effectively a pipe dream... Maybe the Register's unspoken assumption is correct and the battle for data privacy is already lost.

More things to consider

Responsibility for the data being correct now lies with the individual. If the data you use is wrong, its the person's fault not yours. This means a massive reduction in red tape, lawsuits, and pretty much anything connected with the storage of data.
We are pretty much killing, Google, Facebook, many advertisers, and anyone else who relies on having a big database full of your information that it can use to track you or sell.
People would have to be educated as to the importance of security themselves. If their account gets hacked, they are stuffed... but it would be their fault.

Code Cradle

Monday, November 15, 2010

Distributed Privacy : A Distraction

No comments:

Post a Comment