Why Use PeerBook?
Most of the below text has been taken (with some editing) from the context survey section of my dissertation.
| 1 | Personal Data Leakage on Social Networks |
| 1.1 | Forms of Personally Identifiable Information Leakage |
| 2 | Online Social Networks with Enhanced Privacy |
| 2.1 | Social Virtual Private Networks (VPNs) |
| 2.1.1 | Secure Connection Establishment |
| 2.1.2 | Social VPN Analysis |
| 2.2 | Virtual Individual Server based Online Social Networks |
| 2.2.1 | Cloud-based Decentralisation |
| 2.2.2 | Desktop-based Decentralisation |
| 2.2.3 | Hybrid Decentralisation |
| 2.2.4 | Virtual Individual Server Analysis |
| 2.3 | Persona: User-Defined Privacy |
1 Personal Data Leakage on Social Networks
In [1], Krishnamurthy and Wills investigate exactly how much “Personally Identifiable Information” (PII) is leaked via Online Social Networks (OSNs) and discuss some methods to prevent this leakage.
Aggregators are third-party websites that host content or advertisements for other websites; in this case for OSNs. They are able to track and combine information concerning what websites users visit across a range of ‘first-party’ websites. These ‘first-party’ websites include the OSNs themselves. Previously [2], the authors identified nine third-party websites that are “all in the top-10 third-party domains for popular Web sites”, with virtually all of the eleven OSNs using between two to five of these third-party domains for content or advertisement hosting. The authors have since discovered that the top-10 third-party domains have gone from being included as part of 40% of a large number of websites to 70%.
The paper is concerned with trying to discern whether these third-party aggregators are able to not just associate website viewing habits with a single unidentified user, but actually with a specific person. They have found that a through a combination of HTTP headers and tracking cookies that are sent to third-party aggregators, these aggregators are easily able to identify actual people and track what they do. It has been shown that by linking together just three pieces of PII (a birth date, zip code, and gender), 87% of Americans are uniquely identifiable [3].
Often, leaked information is not even just a pointer to a specific piece of PII, but instead actually includes PII. The paper investigates potential data leakage in twelve OSNs: Bebo, Digg, Facebook, Friendster, Hi5, Imeem, LinkedIn, LiveJournal, MySpace, Orkut, Twitter and Xanga. The paper first discusses some of the privacy controls made available to users, and shows how the default privacy settings for most OSNs are incredibly permissive in allowing other users (who may not even have a social link with the user in question) access to PII.
1.1 Forms of Personally Identifiable Information Leakage
The paper investigates three different forms of PII leakage: OSN Identifiers that are not PII but could be used to identify the user on the OSN in question, leakage via OSN applications or ‘apps’, and leakage of actual pieces of PII.
Leakage of OSN identifiers was found to occur in eleven of the twelve OSNs that were investigated. Often visiting a page associated with a given user means putting that user’s OSN identifier in the request URI, which is then passed on to the third-party aggregators via the Referer HTTP header. A similar process can occur with cookies that get passed to third-parties via content on the page that appears to be first-party content but is actually ‘hidden’ third-party content. The twelfth OSN, the one that did not leak OSN identifiers, was Orkut.
External applications have become very popular as add-ons to OSNs, providing extra functionality such as games and quizzes. Unfortunately these applications themselves are known to sometimes leak information via HTTP headers and cookies, and users may be even less aware of potential OSN identity leakage from external applications than from the OSN itself.
The last and potentially most concerning form of PII leak is when actual pieces of PII are transferred via HTTP headers. LiveJournal and Hi5 were found to leak pieces of PII such as a user’s age, gender, zip code and email address. In the email address case, the OSN is leaking information that is usually never even available to other users of the OSN.
2 Online Social Networks with Enhanced Privacy
Here we present three research papers which are highly related to the PeerBook project, each describing a different form of privacy-enhanced OSN.
2.1 Social Virtual Private Networks (VPNs)
Social VPNs is an architecture proposed in “Social VPNs: Integrating Overlay and Social Networks for Seamless P2P Networking” [4]. The paper suggests that by using existing OSN infrastructures, Social VPNs will be “self-configuring, self-managing, yet maintain security against untrusted parties”.
While OSNs are very good for discovering and managing social links between two users, there is little or no functionality for creating direct network links between those users. The paper proposes an architecture which supports creating secure network connections between users using established social links on OSNs.
2.1.1 Secure Connection Establishment
If a user wishes to communicate over a social VPN, that user’s social VPN client first creates a PKI (Public Key Infrastructure) key pair. The client signs a certificate using its private key and publishes its public key on the OSN. The client periodically queries the OSN for any updated or new public keys for the user’s friends on the OSN, storing any changes locally.
Now if two friends wish to communicate they can each other’s locally stored public keys to validate their certificates, and can then use IPsec (Internet Protocol Security) to negotiate a session key exchange. From then on all communication is fully encrypted with the session key.
2.1.2 Social VPN Analysis
The authors of the paper conducted an experiment using the Facebook API, using the Facebook DataStore to store the social VPN public keys. Once the social VPN was established across 430 router nodes (provided by PlanetLab) across the world, they successfully tested a range of applications using the established connections: remote desktop sessions, Samba and NFS file sharing, a 3D game, a web server, a chat and music sharing application, and VoIP (Voice over Internet Protocol). They also quantitively experimented sending files across a network overlay of varying size, and found a good throughput rate of approximately six megabits per second and signal latency of just over five milliseconds in each case, with only a slight drop in performance when IPsec was used. Even though these are good results, performance enhancements are still being investigated.
While this technology is potentially useful for creating secure communications links between users of an OSN for many different applications, it does not get around the fundamental problem that the OSN itself is likely insecure and may leak PII. This is because it still requires OSN infrastructure to function as it needs the social links and storage space of an OSN in order to bootstrap this communication.
2.2 Virtual Individual Server based Online Social Networks
Shakimov et al. propose an OSN in which each user stores their personal data on their own machine, or Virtual Individual Server (VIS). A user’s VIS joins one overlay network for each social group of which the user is a member.
The paper discusses three separate schemes, with each differing in the exact location of the user’s VIS.
2.2.1 Cloud-based Decentralisation
Here VISs are run on a cloud service such as Amazon Elastic Compute Cloud (EC2) [5]. Previously the authors of the paper concerning VIS based OSNs published a paper proposing Vis-à-Vis [6] in which each VIS runs in the cloud. The authors believe that in the future, users are likely to move to a cloud computing model for many of the same reasons that organisations have, most importantly because it means the customer can get rid of the responsibility for keeping hardware up to date but can still keep full legal ownership and control over their own data. Vis-à-Vis also lets the user join several overlay networks just as a person can be part of several social groups.
Vis-à-Vis is composed of a two-tier Distributed Hash Table (DHT). The upper tier is used to maintain access details for each overlay network, and the lower tier DHTs correspond to single overlay groups. These lower tier DHTs are run on the VISs that make up the overlay network to which the DHT corresponds, which requires that each VIS keep routing state for the upper tier DHT (the ‘Meta Group’) as well as routing state for any groups of which they are a member.
This structure lets Vis-à-Vis support open or restricted groups, public or secret groups, and very small to very large groups. However, this comes at an expensive monetary cost because of the use of cloud services; for example, Amazon EC2 can cost upwards of $75 per month to run a single virtual machine.
2.2.2 Desktop-based Decentralisation
The paper suggests that it would usually be far cheaper for the user to host their VIS on a machine that they own. This also has the advantage that the user has physical control over their own machine which may help them protect their data better than with cloud-based decentralisation.
However, this also has the disadvantage that as soon as the user turns their own machine off, their social links lose access to that user’s data. To get around this limitation, the authors propose a ‘socially-informed replication scheme’, where unencrypted user data is stored on trusted VISs of friends on the network. These trusted VISs would be required to follow the data access policies defined by the data’s owner.
This suggestion brings up problems regarding replication on the machines of users who have social links that do not overlap; the example given in the paper is of a user not wishing the details of private communication with her boyfriend to be replicated to her brother’s machine.
2.2.3 Hybrid Decentralisation
With hybrid decentralisation, the key idea is a tradeoff between cost and data availability. In this scheme, users would use their own machine as their VIS until the machine is turned off, at which point a standby VIS is initialised in the cloud. This gets around the problem that socially-informed replicas will often “experience correlated failures”, where if a power cut occurs causing one replica to fail, it is likely that other replicas will also fail if those replicas are stored on the user’s family member’s machines (which are likely in one physical locality).
The challenge here is in detecting failure of the primary or replica VISs, so that the cloud VIS may take over. The options presented in the paper are that the cloud provider could offer a service that periodically checks the primary VIS and begins a virtualised VIS as necessary, or that third-parties who attempt to contact the primary VIS and fail can report this failure to the cloud provider who could start up the VIS instance.
2.2.4 Virtual Individual Server Analysis
There are clear limitations with the use of interconnected VISs to form social network overlays. If using a cloud service either as the place to run the user’s primary VIS or for standby VISs, high monetary cost is incurred. On the other hand, hosting a VIS only on the user’s own machine means that either they can never turn it off or that their data will be inaccessible to their friends during machine downtime. Finally, if using a socially-informed replication scheme, these social links must be completely trusted by the data owner to follow their data access policies.
2.3 Persona: User-Defined Privacy
Persona [7] is a system where Attribute-Based Encryption (ABE) is used to control access to personal data.
In ABE, each user creates an ABE public key and an ABE master key. The user then generates an ABE secret key for each friend. This key matches the set of groups of which the friend is a member. The key is then given to to the friend for whom it was generated.
Users may then encrypt plaintext with regard to a set of attributes. For others to decrypt the resulting text, they must have a corresponding ABE secret key from the originating user that proves they have some or all (depending on the access policy in the original encryption) of the attributes required.
An access group can be defined in a single encryption operation in this way, by constructing a single ABE secret key that encapsulates the attributes required for the group. Any user A who knows any another user B‘s public key can construct any group as long as they know the names of B‘s defined attributes.
Unfortunately, there is a caveat to this powerful encryption scheme. ABE operations are much slower than traditional symmetric cryptography operations. ABE is “100-1000″ times slower than even Public Key Infrastructure cryptography which is itself traditionally thought of as slow in comparison to symmetric cryptography. This especially affects revocation of membership to a group from a user; a new key must be constructed every time a member’s access is revoked.
[1] B. Krishnamurthy and C. E. Wills, “On the leakage of personally identifiable infor- mation via online social networks,” SIGCOMM Comput. Commun. Rev., vol. 40, no. 1, pp. 112–117, 2010.
[2] B. Krishnamurthy and C. Wills, “Characterizing privacy in online social networks,” in WOSP ’08: Proceedings of the first workshop on Online social networks. New York, NY, USA: ACM, 2008, pp. 37–42.
[3] B. Malin, “Betrayed by my shadow: learning data identity via trail matching,” Journal of Privacy Technology, vol. 2005, p. 20050609001, 2005.
[4] R. J. Figueiredo, P. O. Boykin, P. S. Juste, and D. Wolinsky, “Integrating overlay and social networks for seamless p2p networking,” in WETICE ’08: Proceedings of the 2008 IEEE 17th Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises. Washington, DC, USA: IEEE Computer Society, 2008, pp. 93–98.
[5] “Amazon Elastic Compute Cloud (Amazon EC2),” 2010. [Online]. Available: http://aws.amazon.com/ec2/
[6] A. Shakimov, H. Lim, L. P. Cox, and R. Caceres, “Vis-à-vis:online social networking via virtual individual servers,” Duke University, Tech. Rep., May 2008.
[7] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, and D. Starin, “Persona: an online social network with user-defined privacy,” in Proceedings of ACM SIGCOMM 2009. New York, NY, USA: ACM, August 2009, pp. 135–146. [Online]. Available: http://www.cs.umd.edu/~bender/papers/persona.pdf