thoughts and observations of a privacy, security and internet researcher, activist, and policy advisor

Thursday, June 12, 2008

Social Networking with Enemies

Have you ever wondered why you can only make "friends" in social networks? At best, you are able to neutrally "connect". Everything about people you don't like is politely ignored - they normally don't even get a message when you turn down their friendship request.

This thinking that the world only consists of nice people being friendly to each other is of course very childish. From the early philosophers over the founders of modern sociology to Karl von Clausewitz's writings on war, we have learned that society is as well structured by conflicts and less nice attitudes towards each other. So, if we really want to build a social graph that represents all relationships among all people*, we have to model enemies and antagonistic relations as well.

So, I was glad to read that humankind has made big progress. Based on XFN (XML Friends Network), we now have a list of specifications for XEN (XML Enemies Network):
XEN is an extension of XFN. Negative relationship terms have been omitted from XFN by design. (...) XEN values can be used in conjunction with microformats such as hCard, rel-nofollow and vote-links, specifically rev="vote-against". (...)

The interesting byproduct of asserting these relationships correlates to the ancient proverb, "Any enemy of my enemy is my friend". By merging the XEN lists, it should be possible to generate XFN relationships on the fly based on shared enemies.

This feature actually might be nice for political activism. You are looking for people who might want to protest with you against the much hated surveillance-enhancing interior affairs minister? No problem, just look for his enemies.

A few examples:
evil-twin: An evil twin is the concept in fiction of someone equal to a character in all respects, except for a radically inverted morality. Symmetric. If the evil twin is literally a twin brother or sister, it should be combined with the XFN value of sibling.

rival: Someone in the same field of study/activity with whom you are vying for recognition and/or advancement. Often symmetric.

nuisance: Someone who annoys you but not to the point of antagonism.
Of course, you sensed it:
XEN is not a microformat. It is a joke.
But like any good joke, XEN tells us a lot about the difficulties of modeling social relations. It even reflects the fact that there can be several different versions of yourself being represented online - think the drunk yourself at the proverbial facebook picture:
The evil twin value can be applied to a version of yourself from an alternate universe or timeline.
You can now - thanks to XEN - tell everybody, and especially your boss, that you hated what you did and even regret the fact that were at that party in the first place. And you can do it with microformats! Now, that is identity management at its current peak.

*The social graph is an idea I don't particularly like, but that is a different story.

Tuesday, June 10, 2008

IdentityCamp: Lessons Learned in Bremen

The IdentityCamp in Bremen on the weekend was a blast: Focused discussions, energized participants, great weather, a relaxed atmosphere, and interesting interdisciplinary exchange. It seems to have been the first time that the Identity 2.0 crowd really discussed in an open and in-depth way with the privacy people, which was exactly what we hoped would happen. It’s impossible to summarize all the sessions, but here are some interesting observations that I took away from it:

"The buzzword of the day seemed to be OpenID." (Sid Arora). But at the same time, the OpenID community to me left the impression that they are a bit desperate. A number of big players have become OpenID providers, but nobody except for a few blogs and some platforms is consuming OpenIDs issued by other parties. So the session on "Killer Applications for OpenID" left me with the feeling that OpenID is still very much a solution looking for a problem. A way out may be using OpenID not only for authentication, but also for attribute exchange. There are some active attempts into this direction. Dennis Blöte is currently developing a system which uses OpenID for the different online services at Bremen University (e-learning, exams, administration, etc.). Here are his slides.

Convergence of Standards: Infocards and OpenID are moving closer to each other. The best known case for this is using CardSpaceInfoCards for authenticating towards the OpenID provider. But there is more going on, e.g. in creating mobility: The Higgins InfoCards selector stores Infocards online, so you don’t depend on your own machine all the time – which used to be a big plus for OpenID. Johannes Feulner showed the gateway OpenIDbyCard.com he built, which you can use for logging into an OpenID relying party directly with the CardSpace InfoCards interface. One of the problems in building this system was that the attribute semantics were not 100% equivalent to each other. Another approach, which Dick Hardt is working on, is to “tunnel” OpenID Tokens with Infocards. According to Johannes, the latter approach can not translate claims and does not work with self-issued cards, and the relying party needs an upgrade. In the gateway approach, you have to trust the gateway; in the tunnel approach, you have to trust the OpenID provider. Johannes also has a nice OpenID phishing demo online at IDTheft.fun.de.
Update:
There is also convergence between CardSpaceInfoCards and Shibboleth, as Tobias Marquart reports.

We now know what "Identity 3.0" officially means. Caspar Bowden presented on the recently acquired U-Prove technology and how Microsoft plans to integrate it into the Identity Meta-System. Christian Scholz has a good summary. Caspar provided a typology of the generations of identity management:
  1. Identity 1.0: centralized IdM like Passport. The problem was that one IdM is way too powerful.
  2. Identity 2.0: SAML or OpenID like. The problems here are that all IdMs are too powerful, and you have the extra-problem of phishing.
  3. Identity 3.0: smart client-side crypto. Using minimal disclosure tokens, you achieve multi-party security and privacy. By this, you get more independent of the identity provider, which is a good thing from a privacy perspective. The problems here are unresolved patent issues.
Data portability is a complex topic with a number of issues unresolved. Aside from competition issues and the big players not really pushing a standard here for obvious reasons, there is also no common vision on what exactly should be portable, and by whom. In general, the Data Portability Working Group seems not to be too active, especially not on the policy front. I learned at the camp that it depends on your normative perspective on identity. If you want your identity to be coherent and all the different facets open to all of the members of your social environment, you want full portability. This seems to be the case for those folks who are friends with their co-workers anyway. If you want your different roles not connected to each other and prefer a strict division between the private and the public life, you want less portability. At least you want to be able to control who gets to see what, and even when. The general focus is moving from single sign-on to data synchronization. Most people agreed that it would be nice to be able to update your contact data on all platforms you are a member of with one click. The more difficult issue is relationship data, which in the end is not identity management, but societal management. One more reason to get more social scientists in this discussion. But you also need a ton of lawyers, because if company X relies on the IDs provides by company Y, this creates a business relationship between them, too.

"The topic least understood by the participants (at large) seemed to me to be national identity (and their respective cards)." (Sid Arora). This is understandable, as OpenID, Cardspace, and other instances of Identity 2.0 are not really part of most developments around governmentally issued electronic ID cards. This camp was a nice opportunity for people who work on these different corners to meet and exchange views. This is especially important when discussions are starting about the possible use of OpenID in e-government contexts, which happened in Bremen. A lot of scepticism was raised towards this idea, though, mainly because of security issues and the too central role of the identity provider. Caspar Bowden got applause for his question:
"Why use the lowest standard (OpenID) for the most security-relevant use case (government authentication)?"
There was a huge interest in trust online. Which mechanisms generate trust in the offline world, and what is different in online environments? Tina Guenther’s presentation sparked such a lively discussion with her attempt to break down the research questions and get some first insights that she even offered a well-attended second session on Sunday for getting deeper into this.

You can reduce the need to trust with data minimization. A lot of the open questions discussed in the other sessions also boil down to "Who do you trust"? Your government? A corporation like Yahoo? The members of your social network? If the idea of a loosely coupled identity meta-system is that you do not need high trust among all parties, then I see two possible solutions:
  1. Everyone becomes his or her own identity provider and does not have to worry about IdPs collecting their digital traces.
  2. The amount of exchanged data is reduced in general, so you don’t have to trust all kinds of parties. This is where Identity 3.0 with minimal disclosure tokens and zero-knowledge proofs is very promising.
Semantics is the big challenge, not technology. Once Microsoft and IBM sort out the patent issues between U-Prove and Idemix, and the protocols and libraries are available for the public, the technology problems are more or less solved. Most of this (except for the minimal-disclosure crypto) is not rocket science anyway, but normal protocol plumbing. The problem is the translation of the complex social and legal issues around identity into these protocols. How to come up with a reference list of identity tokens for age, location, contacts and all kinds of other issues? How to organize the management of relationship data? Which contractual relationships are implicitly or explicitly involved that need to be sorted out? The idea of having Creative Commons-like licenses for your personal data, which then can be described in a lawyer-readable, a human-readable, and a machine-readable form met quite some interest. But this is mainly a usability issue. The different use cases you want for this are much more complex and diverse than the few standard types of re-using text or music.

This leads to the conclusion by many participants: An interdisciplinary perspective is really needed on the issue of identity. We came pretty close to the ideal, but some perspectives were still missing:
"There was a healthy mix of disciplines represented, including computer scientists and programmers, lawyers, sociologists, social media / web developers and even a few curious students from the Bremen University of Arts, where the event was hosted. A couple historians and policy makers mixed in would have been nice, but considering the method in which such an IdentityCamp was organised (or lack thereof), it was brilliant." (Sid Aora)
There is a great interest in follow-up. People are eager to have the next IdentityCamp and go into the issues more in depth and even develop a common vision. Check the IdentityCamp page regularly to see how we will stay in touch.

A big "thank you" goes to our sponsors: University of the Arts Bremen, big Bremen, Kuppinger Cole + Partner, artundweise, hmmh Multimediahaus, Mister Wong, Spreadshirt, and Pure Tea.

Tuesday, June 03, 2008

"Machine-Readable Government" from 1987 to 2008

At a brainstorming session about future research issues at our section today, I mentioned the term "machine-readable government", which met a lot of interest. I did some quick research on where the term came from. Interesting outcomes:

German hackers in the 1980s

Surprise: The term seems to come already from 1987. First time I could find it was mentioned in the media was in 1988, in an article in the German weekly magazine Der Spiegel about the mailbox and hacker communities in Germany. The term "maschinenlesbare Regierung" was attributed to Chaos Computer Club co-founder Klaus Schleisiek, but it seems to have been a common concept for the first generation of German hackers, as the book about CCC founding father Wau Holland by Daniel Kulla tells us.

It is unclear to me if there was more detailed conceptual thinking about this, or if it was just an ironic catch-phrase.

More recently, the term was again used in the context of the German introduction of Freedom of Information Acts, see e.g. this 2003 CCC congress lecture by CCC co-founder Gerriet Hellwig.


Barack Obama / Lawrence Lessig in the U.S.

More recently, the term has been used for describing some ideas of the Barack Obama campaign in the United States. Obama has quite progressive plans for a more transparent government and the use of open standards for this, see his "technology and innovation" concept paper.

Obama does not say "machine-readable government", but the idea is roughly the same:
"Making government data available online in universally accessible formats to allow citizens to make use of that data to comment, derive value, and take action in their own communities. Greater access to environmental data, for example, will help citizens learn about pollution in their communities, provide information about local conditions back to government and empower people to protect themselves."
Larry Lessig's interpretation and endorsement of this does not use the term "machine-readable government" either, but was interpreted as such by a number of bloggers. Lessig says about Obama's ideas:
"the big part of this is a commitment to making data about the government (as well as government data) publicly available in standard machine readable formats. The promise isn't just the naive promise that government websites will work better and reveal more. It is the really powerful promise to feed the data necessary for the Sunlights and the Maplights of the world to make government work better. Atomize (or RSS-ify) government data (votes, contributions, Members of Congress's calendars) and you enable the rest of us to make clear the economy of influence that is Washington."
This interpretation of course is strongly related to Lessig's current interest and work on a more transparent and less corrupt government. He also announced a first practical project last year in the field of legal texts and decisions:
"Legal Commons (beta): Taking inspiration from the liberator and manumitter of government documents and legal cases, Carl Malamud, Creative Commons will enter into a joint venture with public.resource.org to collect and make available machine readable copies of government documents and law. Carl and I have committed to freeing all federal case law by the end of 2008. Importantly, this effort will not set up competing systems to the emerging ecology of great free law services (Cornell's LII, or Columbia's Altlaw.org). We instead will help gather and make available the resources those services use to provide their amazing service. So look for a tarball of all federal cases by the end of 2008, in parsable and usable plain text."
What's next?

Of course, freeing government information on public spending, on environmental or health data, or on government and parliament decision-making (voting records, contacts with lobbyists etc.) is great, and making this available in machine-readable standardized form is even better. But as we have learned from Creative Commons: "machine-readable" does not automatically translate into "human-readable" or "citizen-readable".

I see two upcoming challenges in this field:

1. Developing tools that make this information digestible by normal citizens. It should be fairly easy for plain environmental data like "compare air pollution over time in all states and tell me if there is a relation to power plants nearby". But social and relational data, such as data on the policial process, is much harder to digest in standardized forms. A contact with a lobbyist can mean a whole range of things, for example. It will be tough to come up with the semantics for this in the first place.

2. Even if this should be possible, the interpretation of such complex datasets is not really easy. This is a challenge for activists and political groups that will want to build tools around this data, and others who will do mash-ups from those. I certainly see the danger of mistaking correlation for causality here, as well as other reasons for blaming the wrong person or factor. In general, I am not sure if this in the long term will lead to better quality of political debates and decisions. You can also imagine a future where the political opponents only throw statistics at each other, and where the discourse over values and social visions gets even more marginalized.

That said, of course I totally agree that more transparency of government is better than less. And if machines can help us aggregate and digest the information, we should really give it a try.

PS: If anybody knows more comprehensive literature around these ideas, please let me know!

Update: The broader term for this (which is also much more common in the english-speaking world) is "open government". This also includes citizen wikis on government and parliament people and activities as well as similar approaches, where the data is not necessarily in standardized - i.e. machine-readable and digestable - formats.

Some sources on this:
Thanks to Markus Beckedahl for the helpful hints and links.