Using Social Semantic Web Data for Privacy Policies

Kigel, Emily

Using Social Semantic Web Data for Privacy Policies

Zusammenfassung

Inhaltsangabe:Abstract:
In the last years the web underwent a drastic shift from a static, centralised information system to a dynamic, user-generated, distributed and open platform, and users changed from passive consumers to active participants, interacting, creating and sharing content. This 'new' web is called Web 2.0. In the era of this movement new Social Web applications emerged creating an environment for people to publish, share and discuss content, plus enabling people to create descriptive profiles of themselves for self-expression and build social networks consisting of relationships with others with the purpose of interaction and communication. With the increasing popularity of such social networking applications the number of users has scaled up and is still growing. Not only the number of users but also the web traffic is an indicator to the growing importance of social networking platforms which are now among the most visited websites.
With over 100 million unique visitors worldwide, Facebook is one of the most popular networking sites on the web, moreover the site ranks third in the top visited sites on the web only being surpassed by Google and Yahoo! according to Alexa. YouTube (with over 80 million unique visitors), MySpace (with about 60 million unique visitors) and Flickr (about 30 million unique visitors) are other examples of prominent social networking platforms.
However, the availability of such a huge amount of information within the social networking sites and the open nature of the services and their usage also attracts the attention of parties with marketing purposes or malicious intent. Users are thereby put at risk of online stalking, phishing, identity theft, spamming, passing on data to third parties and privacy issues which are related to personal data exposure due to insufficient access control.
By maintaining social networks and actively participating in Social Web activities like interacting with others, users unwittingly expose sensitive and personal or inappropriate, even reputation-damaging data not only to friends but to an audience that mostly remains invisible and consists of strangers or acquaintances that potentially are not supposed to see such information. Thus the revealed information can lead to major consequences if read out of context or read by parties, like authorities or job recruiters, for whom this information was not intended. The reputation of social networking sites has been slightly […]

Leseprobe

Inhaltsverzeichnis

Emily Kigel

Using Social Semantic Web Data for Privacy Policies

ISBN: 978-3-8366-4441-9

Herstellung: Diplomica® Verlag GmbH, Hamburg, 2010

Zugl. Leibniz Universität Hannover, Hannover, Deutschland, Bachelorarbeit, 2009

Dieses Werk ist urheberrechtlich geschützt. Die dadurch begründeten Rechte,

insbesondere die der Übersetzung, des Nachdrucks, des Vortrags, der Entnahme von

Abbildungen und Tabellen, der Funksendung, der Mikroverfilmung oder der

Vervielfältigung auf anderen Wegen und der Speicherung in Datenverarbeitungsanlagen,

bleiben, auch bei nur auszugsweiser Verwertung, vorbehalten. Eine Vervielfältigung

dieses Werkes oder von Teilen dieses Werkes ist auch im Einzelfall nur in den Grenzen

der gesetzlichen Bestimmungen des Urheberrechtsgesetzes der Bundesrepublik

Deutschland in der jeweils geltenden Fassung zulässig. Sie ist grundsätzlich

vergütungspflichtig. Zuwiderhandlungen unterliegen den Strafbestimmungen des

Urheberrechtes.

Die Wiedergabe von Gebrauchsnamen, Handelsnamen, Warenbezeichnungen usw. in

diesem Werk berechtigt auch ohne besondere Kennzeichnung nicht zu der Annahme,

dass solche Namen im Sinne der Warenzeichen- und Markenschutz-Gesetzgebung als frei

zu betrachten wären und daher von jedermann benutzt werden dürften.

Die Informationen in diesem Werk wurden mit Sorgfalt erarbeitet. Dennoch können

Fehler nicht vollständig ausgeschlossen werden und der Verlag, die Autoren oder

Übersetzer übernehmen keine juristische Verantwortung oder irgendeine Haftung für evtl.

verbliebene fehlerhafte Angaben und deren Folgen.

http://www.diplomica.de, Hamburg 2010

Abstract

Social Web applications are steadily gaining popularity. At the same time, the open

nature of such services leads to the exposure of an immense amount of personal data.

Due to insucient access control on nowadays Social Web applications problems in

terms of privacy arise. This thesis focuses on the need for more exible and ne-

grained privacy restrictions. It analyses privacy problems of current Social Web

applications and compares the privacy preferences such applications oer. Based

on this analysis, this thesis extends the well-known principle of policy-based access

control, which is a exible and dynamic way to dene who can get access to what

content based on user preferences. The presented extension accommodates policies

to the requirements of the Social Web. In particular, it describes how to exploit

Social Semantic Web data for privacy reasoning. This includes the retrieval of Social

and Semantic Web data from various information sources on the Web. It further

includes its usage for the denition of privacy policies and its consideration during

the policy evaluation. Consequently, using Social Semantic Web data for policy

reasoning allows users to exactly dene which social relationships and properties

a requester has to have in order to access a particular resource. These conditions

can cross the boundaries of a single Social Web application. Hence, a user can for

example state that a friend on one application can access pictures stored on another

application; thus bridging the walled garden of nowadays Social Web applications.

Contents

1 Introduction

2 Motivating Scenario and Problem Statement

2.1 Motivating Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Background

3.1 The Social Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1.1 Social Networking Sites . . . . . . . . . . . . . . . . . . . . .

3.2 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2.1 The Resource Description Framework (RDF) . . . . . . . . .

3.2.2 The SPARQL Protocol And Query Language . . . . . . . . .

3.2.3 SPARQL Endpoints . . . . . . . . . . . . . . . . . . . . . . .

3.2.4 The Social Semantic Web . . . . . . . . . . . . . . . . . . . .

3.3 Privacy Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.3.1 Policy Languages . . . . . . . . . . . . . . . . . . . . . . . . .

3.3.2 Policy Frameworks . . . . . . . . . . . . . . . . . . . . . . . .

3.3.3 The Protune Framework . . . . . . . . . . . . . . . . . . . . .

4 The Social Web from a Privacy Perspective

4.1 Data Disclosure - Why Better Control is Needed . . . . . . . . . . .

4.1.1 Privacy Issues . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.1.2 Information Overload . . . . . . . . . . . . . . . . . . . . . .

4.2 Privacy Protection on the Social Web - a State of Art . . . . . . . .

4.2.1 Twitter and its Privacy Options . . . . . . . . . . . . . . . . .

4.2.2 Facebook and its Privacy Options . . . . . . . . . . . . . . . .

4.2.3 Flickr and its Privacy Options . . . . . . . . . . . . . . . . . .

4.3 Comparing Privacy Preferences on Social Platforms . . . . . . . . . .

4.3.1 Levels of Trust for Data Disclosure . . . . . . . . . . . . . . .

4.3.2 Network Features and their Protection . . . . . . . . . . . . .

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Policy Reasoning Based on Social and Semantic Web Data

5.1 Requirements for Policies on the Social Web . . . . . . . . . . . . . .

5.2 Social and Semantic Web Data for Policy Specication and Evaluation 37

5.2.1 Types of Social Data and their Availability . . . . . . . . . .

5.2.2 Using Social Data to Dene New Concepts . . . . . . . . . .

5.2.3 Enforcing Policies upon an Application . . . . . . . . . . . . .

5.3 Taking up the Motivating Scenario . . . . . . . . . . . . . . . . . . .

6 Implementation

6.1 Retrieving Heterogeneous Information . . . . . . . . . . . . . . . . .

6.1.1 Retrieving Social Web data . . . . . . . . . . . . . . . . . . .

6.1.2 Retrieving Social Semantic Web Data . . . . . . . . . . . . .

6.2 Wrappers for External Information Sources . . . . . . . . . . . . . .

6.2.1 IN-Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.2.2 SPARQL Endpoint Wrapper . . . . . . . . . . . . . . . . . .

6.2.3 DBpedia Wrapper . . . . . . . . . . . . . . . . . . . . . . . .

6.2.4 DBLP Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . .

6.2.5 RDF Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . .

6.2.6 Flickr Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . .

6.2.7 Twitter Wrapper . . . . . . . . . . . . . . . . . . . . . . . . .

6.3 SPoX- A Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 Related Work

8 Conclusions and Outlook

Bibliography

iii

1 Introduction

In the last years the web underwent a drastic shift from a static, centralised infor-

mation system to a dynamic, user-generated, distributed and open platform, and

users changed from passive consumers to active participants, interacting, creating

and sharing content. [1] This `new' web is called Web 2.0. In the era of this move-

ment new Social Web applications emerged creating an environment for people to

publish, share and discuss content, plus enabling people to create descriptive proles

of themselves for self-expression and build social networks consisting of relationships

with others with the purpose of interaction and communication. With the increasing

popularity of such social networking applications the number of users has scaled up

and is still growing. Not only the number of users but also the web trac is an

indicator to the growing importance of social networking platforms which are now

among the most visited websites.

With over 100 million unique visitors worldwide, Facebook is one of the most

popular networking sites on the web

, moreover the site ranks third in the top visited

sites on the web only being surpassed by Google and Yahoo! according to Alexa

YouTube (with over 80 million unique visitors), MySpace (with about 60 million

unique visitors) and Flickr (about 30 million unique visitors)

are other examples

of prominent social networking platforms.

However, the availability of such a huge amount of information within the social

networking sites and the open nature of the services and their usage also attracts the

attention of parties with marketing purposes or malicious intent. Users are thereby

put at risk of online stalking, phishing

, identity theft, spamming, passing on data

to third parties and privacy issues which are related to personal data exposure due

to insucient access control.

By maintaining social networks and actively participating in Social Web activi-

ties like interacting with others, users unwittingly expose sensitive and personal or

inappropriate, even reputation-damaging data not only to friends but to an audience

that mostly remains invisible and consists of strangers or acquaintances that poten-

tially are not supposed to see such information. Thus the revealed information can

lead to major consequences if read out of context or read by parties, like authorities

or job recruiters, for whom this information was not intended. For example, some

real-world problems based on privacy issues can be read in [2, 3, 4, 5]. These are

just a few examples that received major public attention. The reputation of social

networking sites has been slightly diminished by several such incidents that often

reach the attention of the media. Seeing that social networking users become more

aware of such privacy risks and do not stop from expressing their concerns

, the

http://siteanalytics.compete.com/facebook.com/; retrieved August 14, 2009

A web service which measures web trac of web sites and creates ranking list accordingly;

Alexa trac rankings, retrieved August 14, 2009 from http://www.alexa.com/topsites/global

http://siteanalytics.compete.com/youtube.com/, http://siteanalytics.compete.com/myspace.com/,

http://siteanalytics.compete.com/ickr.com/.

Phishing is an attempt to get personal data of users by using forged websites.

Introduction of the `News Feed' on Facebook, which informed users about the newest activities

of their friends, lead to an online petition, including over 700,000 users, demanding an abolition of

privacy settings need to be acknowledged as an important part of Social Web appli-

cations and extensive research needs to be done in the area of improving the privacy

preferences and giving the users control over who can see what of their social data.

Although social networking sites have already realised the need for privacy pro-

tection and some Social Web applications like Facebook have even installed more

complex access restrictions, the privacy preferences are still not ne-grained and

exible enough and far from being satisfying. Further on, such privacy settings con-

ne themselves to the properties of the own website without making use of data

created beyond the boundaries of the own application.

Policy-based access control is an approach to protect privacy in open systems like

the Social Web applications and can further help to control the information overload

which users are facing on the Social Web. With policies being formal, well-dened

statements [7] the process of dening who can get access to what content based

on user preferences can be realized in a exible and dynamic way. Nevertheless,

the current policy-based control of the behaviour of complex systems does not oer

solutions for the movement towards Social Web where information about users, their

content and their relationships is not conned to one application only but is spread

out across the whole Social Web. This is also the problem of privacy settings of

the current Social Web applications which oer preferences only according to the

established relationships and other attributes within the own website.

The contribution of this thesis is therefore to enhance the privacy policies in terms

of integrating data from various information sources such as Social Web applications

into the policy specication and reasoning process. Such policies can save people

the trouble to create the same social data

on each Social Web application they

are members of. These privacy policies can be proved to be benecial seeing as

there are numerous social networking sites emerging and oering people various

functionalities and side-specic features and people spend an increasing amount of

time in maintaining all this distributed data throughout the many services.

Social Web applications provide their information in proprietary formats via their

own site-specic application programming interfaces. In this thesis the presented ap-

proach collects this arbitrary, heterogeneous data and provides it in a homogeneous

format so that it can be integrated into policies and exploited for policy reasoning.

Furthermore, in addition to Social Web data, Semantic Web data can be included in

policy decisions as well to extend the variety of policy specication. Such semantic

information is available in non-application-specic standard formats which can be

easily transported and reused. Additionally, information provided by Social Web ap-

plications can be retrieved as Social Semantic data, allowing to convert Social data

into a unique format using Semantic Web technologies. In this thesis the process

of retrieving all these information types, transforming the extracted data into a for-

mat appropriate for the policy-based access control and combining them to create

ne-grained privacy policies will be explained and demonstrated using selected in-

formation sources from the Web. To implement this presented solution the policy

the feature. [6]

Social data includes user-generated content (bookmarks, tags, reviews, photos, blog posts etc.),

personal data of users and their social networks.

framework Protune is used to automate the evaluation and decision process based

on the conclusion drawn from privacy policies.

The remainder of the thesis is organised as follows. Section 2 presents a scenario

showing how privacy policies can be integrated into a web application and enhance

the user experience on such sites. Further on the problem statement is identied and

described. Section 3 is intended to provide the background information, necessary

for understanding this thesis. Section 4 analyses privacy problems of current Social

Web applications and compares the privacy preferences such applications oer. The

extension of policy-based access control to accommodate policies to the requirements

of the Social Web is presented in Section 5 and the motivating scenario is revisited.

Subsequently, Section 6 describes how the Social and Semantic Web data can be

retrieved in order to include this data into Protune, describing the actual imple-

mentation of the Protune extension for Social Semantic Web data. A prototype

implementation called SPoX [8] is also presented, which demonstrates the usage of

policy-based behaviour control on the Social application Skype. After a presentation

of related work in Section 7, Section 8 concludes this thesis also providing an outlook

for future research.

2 Motivating Scenario and Problem Statement

2.1 Motivating Scenario

In order to explain the benets of using a policy-driven approach to protect privacy

in open systems, a ctional scenario will be presented that serves as a use case to

demonstrate how privacy policies based on various Social and Semantic Web data can

be used to control the own data on a Social Web application. Parts of the scenario

will be used throughout this thesis to explain the syntax of policies in general and

Protune policies in particular and how external data can be retrieved and integrated

into the Protune framework.

Bob is a scientist who is working in a company and taking part in research

projects as well as holding a seminar for students at the local university. Bob has

a very active web presence; besides having proles on numerous social networking

sites like Facebook, Flickr and Twitter, he also manages his own website, where he

uploads information about dierent aspects of his life, such as his business, interests

and family-related information. Furthermore, being a supporter of the Semantic Web

movement, he has uploaded a FOAF le to express his network of acquaintances.

(S1) To be easily reached by his colleagues, his students at the university as well as

by his friends, Bob posts his contact information on his website. Because his email

address is not too private he agrees if any of his friends and colleagues can see it.

(S2) Nevertheless some contact data is very sensitive, therefore Bob does not want

anybody to see all his contact information; for example Bob only wants to disclose his

phone number to his family and close-friends, which he added to his `close-friend'-list

on Facebook.

(S3) Bob uses his website for blogging as well. He publishes ndings concerning

some of his researches and includes links to other interesting websites which are

relevant for his profession. He also talks about his personal life, discusses stories

and posts pictures about family and friends as well as about his interests beyond his

profession. As some of the information he posts is private, such as pictures of his last

vacation trip or a wish list he compiled for his next birthday, he only wants his Flickr

and Facebook friends in addition to his family to see it. Moreover everybody who is

in the Flickr group about the landscape of Southern France, can see the pictures of

his holiday trip, which are tagged with the keyword `Southern France' but not the

ones tagged `private', as these are only intended for the family and close friends.

(S4) Any work-specic information Bob posts or les he uploads should only be

visible to his colleagues, belonging to the work network of his company on Facebook.

Additionally Bob wants the co-authors of his publications to also have access to

updates about any research projects.

(S5) Bob has many dierent interests and hobbies, which he likes to write about.

One of them is his passion for baseball. Unfortunately not all his friends share the

same hobby, that is why Bob would like to disclose his updates concerning that sport

to people appreciating reading it. Say, if some of Bob's friends are in the group about

baseball on Facebook, they are more likely to prefer reading Bob's thoughts on the

next baseball championship, rather then reading his presentation slides about the

newest Semantic Web technologies, which in return would go down well with his

FOAF friends. That is why Bob wants his friends, who are either also in a baseball

group or people having a blog writing about baseball to be able to see baseball-related

information.

(S6) His presentation slides are only intended for his acquaintances interested in

Semantic Web. Further on, Bob wants to use this presentation slides in the next

lecture he is giving. To get relevant feedback on the quality of these slides, he hopes

his colleagues and friends who are skilled in that eld are going to read the slides and

give their constructive criticism. He knows that all friends he added in his FOAF

prole, his colleagues in his company network on Facebook and any friends, who

are also co-authors of his publications are all skilled enough to help him. As theses

slides are in German language he only wants people to read the slides who master

this language.

2.2 Problem Statement

At present time, the possibilities to dene and enforce privacy preferences are too

restrictive and unadaptable to the user's needs. The internet lacks walls, as danah

boyd pointed out [9], which is why the ability to dene expressive privacy preferences

and improve user's control is essential, especially when it comes to sensitive and

personal data.

To be able to decide which part of the personal data can be disclosed to whom,

each user may have his own ideas on the right privacy preferences according to his

personal situation and purposes on the Social Web; for example a user is searching

for new business contacts, wants to meet new people or wants to keep in touch with

his current friends, among many others. As these situations are quite complex but

also individual for each user it is dicult to provide static and predened privacy

options which are also sucient enough to accommodate each individual need of

thousands of users of a Social Web application. Therefore users need to have privacy

settings going beyond some predened checkboxes with a few selected options which

also are of binary type in most cases, such as a prole is either private or public, a

user is either a friend or a stranger and so on.

Furthermore another shortcoming of the Social Web applications, which needs

to be overcome, is the restriction of the users in dening privacy preferences based

on information within the border of their own application. Each of the Social Web

applications is like an island collecting social data of their users and providing it in

a site-specic, proprietary way. Due to the growing diculty of maintaining such

distributed amount of data and the increasing time people spend on the web to

manage their identities and social networks and having to re-enter their information

every time a new website is being used, it can be proved to be benecial to incorporate

social data of any of the otherwise isolated Social Web applications into the privacy

preferences of an application.

Another problem is the perpetual change of the structures of individual social

networks as new relationships are build and already established relationships change

their statuses such as an acquaintance can become a close friend and so on. All

this changes internally but also beyond the borders of one application need to ow

into the privacy preferences without the user having to adjust the settings manually.

Say, if a new colleague wants to access a user's data meant for the employers of his

workplace, the privacy settings should automatically recognise and include her as a

new colleague.

Of course, exibility and big variety goes along with usability challenges; with

privacy settings being dicult to understand and hard to adjust for normal users.

This problem leads to users mostly keeping the default settings as it is already often

the case in nowadays Social Web applications. [10]

Bearing all arising problems with regard to privacy preferences in mind, this

thesis seizes on the need for a ne-grained privacy management and exploits policies

for dening the behaviour of a system based on certain conditions. These policies

need to be expanded in a way that enables a policy-based access control that can be

applied to the happenings of the Social Web so that a scenario as presented in the

previous section can become reality. To achieve this goal several requirements need

to be met:

A dynamic and well-dened policy language is needed for specifying the poli-

cies.

This language must have the ability to incorporate data from any possible

sources on the Web beyond the border of the own application which also sup-

ports the high level of exibility.

These data can either be Social Data or Semantic Web data of any kind,

such as attributes of a user or a group, information about relationships, user-

generated data or activities, general information like publications and their

authors, information about countries, languages and many others.

Further qualities policies should have are:

Fine-grained policies: policies need to be detailed enough to be applied

to the complex scenarios on the Web, so that any arbitrary combinations

of a user's needs can be realized.

Dynamic and automatically adopting to changes: Policies should be ad-

justable to changes occurring on the Social Web, such as if a new colleague

requires access, the policies are adapted dynamically to the changes and

therefore recognise the colleague as such and include him into the appro-

priate concepts.

Usability: The specication process of policies should be intuitive, simple

and fast. People who are not skilled in the formal syntax of the policy

languages should have no problem to dene policies.

Lucidity: Users should clearly understand what a policy they have created

does without leaving room for interpretation.

When access to a resource is denied an explanation is needed helping the

user to understand why his request failed.

To automate the evaluation and decision process of the dened policies a policy

framework is needed. This policy framework automatically queries the respec-

tive information sources and selects the information a user wants to use for his

policies independently of the format in which the data is provided.

The extracted information is unied and combined to create policies, meaning

data from dierent sources can be incorporate into one policy.

Policies should be enforced upon an application, that is, if a resource which

is protected by such a policy is requested, the framework evaluates the policy

and according to the result either allows or denies access.

To be able to correctly reason over a policy the framework needs to catego-

rize the requester according to some provided identication properties. The

requester is added to either the group of people who are allowed to access the

source or the one for whom access is denied.

Analogously the information overload, due to the various communication tech-

niques on Social Web applications, needs be controlled with the policies as

well.

3 Background

This section is intended to provide background information which is relevant for

understanding the subsequent elaborations in this thesis.

3.1 The Social Web

The Web has undergone a change in terms of the way content is being created and

used as well as the roles of providers (authors) and consumers (readers) which are

not mutually exclusive anymore. The concept describing the shift from the static,

centralised Web to a dynamic, user-generated, open platform is often referred to as

the Web 2.0, coined by Tim O'Reilly as he published the article What is Web 2.0?.

Web 2.0 denes new Web technologies which enable rich user interfaces, provide

services open for others to use, combine various sources and enable user participation

and interactivity. [11]

Some of the most distinct principles of Web 2.0 are the linking of data throughout

the web and providing services that enable the users to not only socialize online

but also publish, share, reuse and generally participate in creating content. This

concept is also often referred to as the Social Web. In the era of this movement

towards interoperability, information sharing and interacting the so-called Social Web

applications have emerged providing an environment where people can link with each

other to create personal social networks of relationships and collaboratively create

content. Some of the most typical application categories that built the Social Web

are wikis, blogs, podcasts, social bookmarking web services, social networking sites

and content sharing sites.

Social networking sites and content sharing sites have both the purpose to built

social networks among the members, be it the primary goal as it is the case with social

networking sites or the secondary objective like with the content sharing site. That

is why these sites are the main focus here, because the aspect of social interaction is

important for this thesis. Further on the content sharing sites will not be separately

presented in this section as their community aspect is similar to the one in the social

networking sites.

3.1.1 Social Networking Sites

This subsection briey introduces social networking sites and the common features,

that are found in most of them. A more detailed description of the social networking

sites' specic functionalities with regard to privacy will be presented in Section 4.

Social networking sites give people the opportunity to maintain their real-world

social connections such as friends, family and colleagues online. Users can also

build new relationships based on common ground; shared interests and activities

or any other aliations like common geographical location or business contacts to

communicate and to expand their online social network. As a result users create

links to other users producing the so-called social graph which represents a web

of connections with direct ties and indirect ties (Friends of Friends). [12] Social

networking sites enable their users to create self-portraying proles which can contain

all possible personal information about a user. Users also post self-generated content

about their lives, activities and interests. The purpose of publishing data is mostly to

share and discuss it with other community members. Information a user can provide

about himself in his prole can be divided in three categories according to [13]:

Contact information: such as name, address, e-mail address, telephone and mo-

bile phone number,

Individual information: including relationship status, personality attributes, sex-

ual preferences, physical attributes; birthday, education and occupation,

religious- and political aliation,

Interest information: subjects of interest, hobbies, favourite books, movies, etc.,

and any association aliations.

Furthermore data about users emerges from them being active on the social network-

ing sites, for instance, joining groups or befriending someone.

Social networking sites provide various functionalities to support the social in-

teractions and communication of data; chat, messaging, blogging, discussion groups,

tagging

and linking content and writing on one another's `Walls`

are the most

common functionalities.

All this published information can be intended for either a selected group of

people like friends, family or for the general consumption. For this purpose one of

the main characteristics on social networking sites is to dene, who of the people

belongs to the trusted set of users, also referred to as friends on the most social

networking sites, and who belongs to the non-trusted set of users, labelled as the

strangers. [14]. The most social networking sites further divide the non-trusted

group in social networking sites members and non- members. The term friend,

which is commonly applied to the trusted group of people represents a consensual

connection between two users and has not necessarily the same meaning as it does

oine as people referred to acquaintances in the real-world are often seen as friends

in the online world. Although on most sites befriending someone is a bi-directional

process, that requires conrmation of both parties to establish a link between the

proles, there are also sites where one-directional ties are common. The access to

information is mostly based on the established status of the user.

3.2 The Semantic Web

Information on the internet is mostly provided either for the humans consumption

or for machines to process. Furthermore data provided by one application may not

be understood by another, making the process of data exchange and integration very

hard. The Semantic Web aims to represent and structure data of the current Web in

Tagging is used for organizing and interlinking content. To tag a content means to assign a

descriptive keyword to the content to make the search for this data easier.

The term Wall is used in Facebook to describe the reserved space on a user's prole where the

user himself or his friends can write public messages.

a way to make it understandable for humans and enable machines to understand the

semantics of the content as well. Therefore data needs to be well-dened, interlinked

and annotated with metadata so that it can be read, understood and processed by

software agents and exchanged across various applications. [15]

The Semantic Web is based on various technologies to make data available in a

semantically structured format, as it is illustrated in the Semantic Web Stack seen

in [16].

3.2.1 The Resource Description Framework (RDF)

One of the most widely adopted open, standard format, used to provide machine-

readable information, is the Resource Description Framework (RDF), which was

developed by the W3C

. With RDF Web resources can be described in a common way

so that they can be read and understood by dierent applications. Information about

web pages, personal information of people, among many others can be modelled using

RDF. RDF is based on two technologies; XML

and URI

, whereby URI is used

to identify resources on the Web and XML is used to exchange information between

dierent systems. [17]

Information in RDF is written in statements like The location of the company

Daimler AG is Germany, that consist of triples of {Subject, Predicate, Object}, for

instance Daimler_AG hasLocation Germany. An RDF statement describes a

- resource

(the

Subject)

identied

URI,

such

http://dbpedia.org/page/Daimler_AG

- the property of a resource (the Predicate) such as `location' and

- the value of a property, which can be another resource such as

http://dbpedia.org/page/Germany or a literal like `true' or `25'. Several

such triples form an RDF graph. [18]

3.2.2 The SPARQL Protocol And Query Language

To query such previously described triple-data provided by an RDF graph the RDF

query language SPARQL (SPARQL Protocol And Query Language) can be used.

The Syntax of SPARQL is similar to SQL using SELECT

, FROM and WHERE to

form a query. But as RDF is written in triples the SPARQL syntax is using the same

pattern of {Subject, Predicate, Object} statements as one can see in the following

example query. Here the query returns all companies in the RDF graph with the

location `Germany':

the World Wide Web Consortium

eXtensible Markup Language

Uniform Resource Identier

The SPARQL language also enables other query forms; DESCRIBE and ASK, but as they are

of no relevance for this thesis, they will not be explained here.

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX : <http://dbpedia.org/resource/>

SELECT ?company

WHERE {

?company dbo:location :Germany.

}

3.2.3 SPARQL Endpoints

With the Semantic Web evolvement a lot of information is made available on the web

in the RDF-format, allowing third parties to retrieve this data via so-called SPARQL

endpoints. The endpoints are web services that allow to address and query RDF-data

sources using SPARQL. The query results are then presented in an XML-format.

Popular endpoints exist for services like DBpedia and DBLP, both oer large

knowledge bases for external applications to use. DBpedia is a semantic database

which extracts structured information from Wikipedia and provides it in a Semantic

Web conform way, strictly speaking the information is presented in the RDF format.

The DBpedia datasets comprises information from numerous subjects; from dierent

geographical locations via known people like athletes or artist through to various or-

ganisations including companies, educational institutions and sports teams

. Using

the dataset queries like People who were born in Berlin before 1900 [19] can be

answered. The DBpedia project also follows the Linked Data principles [20]: all con-

cepts are identied using URI references, allowing the interlinkage to other datasets

like DBLP

or Geonames

DBLP is a bibliographic database based around computer science which col-

lects information about conferences and authors and their publications, papers and

journals among others. The DBLP D2R Server provides all this information in a

Semantic Web format and allows to access the data via an endpoint.

3.2.4 The Social Semantic Web

In the last years Semantic Web technologies are used more frequently to represent

Social Data provided by Social Web applications. These technologies enable to inter-

link dierent datasets from various sources with each other, thereby helping the data

portability movement to create the Web of Data [15]. The resulting Social Semantic

Web data is represented in a reusable, machine-readable, and non-application-specic

standard format. The data becomes more accessible and it can be easily integrated

in other applications and exchanged among them.

Semantically-interlinked Social data needs representation mechanisms to model

specic social information; the user and his social network and the user's generated

content. The FOAF vocabulary is used to describe people and their connections

the entire ontology of DBpedia is available at: http://www4.wiwiss.fu-berlin.de/dbpedia/dev/ontology.htm,

visited on September 22nd 2009

http://dblp.uni-trier.de/

www.geonames.org/

see data portability project for further reading; http://www.dataportability.org/

Details

Seiten
Erscheinungsform: Originalausgabe
Erscheinungsjahr: 2009
ISBN (eBook): 9783836644419
DOI: 10.3239/9783836644419
Dateigröße: 1015 KB
Sprache: Englisch
Institution / Hochschule: Gottfried Wilhelm Leibniz Universität Hannover – Informatik, Studiengang Informatik
Erscheinungsdatum: 2010 (März)
Note: 1,7
Schlagworte: netzwerke privacy policies social semantic
Produktsicherheit: Diplom.de

Autor

Emily Kigel (Autor:in)

Using Social Semantic Web Data for Privacy Policies

Zusammenfassung

Leseprobe

Inhaltsverzeichnis

Details

Autor

Emily Kigel (Autor:in)