Lade Inhalt...

Using Social Semantic Web Data for Privacy Policies

©2009 Bachelorarbeit 72 Seiten

Zusammenfassung

Inhaltsangabe:Abstract:
In the last years the web underwent a drastic shift from a static, centralised information system to a dynamic, user-generated, distributed and open platform, and users changed from passive consumers to active participants, interacting, creating and sharing content. This 'new' web is called Web 2.0. In the era of this movement new Social Web applications emerged creating an environment for people to publish, share and discuss content, plus enabling people to create descriptive profiles of themselves for self-expression and build social networks consisting of relationships with others with the purpose of interaction and communication. With the increasing popularity of such social networking applications the number of users has scaled up and is still growing. Not only the number of users but also the web traffic is an indicator to the growing importance of social networking platforms which are now among the most visited websites.
With over 100 million unique visitors worldwide, Facebook is one of the most popular networking sites on the web, moreover the site ranks third in the top visited sites on the web only being surpassed by Google and Yahoo! according to Alexa. YouTube (with over 80 million unique visitors), MySpace (with about 60 million unique visitors) and Flickr (about 30 million unique visitors) are other examples of prominent social networking platforms.
However, the availability of such a huge amount of information within the social networking sites and the open nature of the services and their usage also attracts the attention of parties with marketing purposes or malicious intent. Users are thereby put at risk of online stalking, phishing, identity theft, spamming, passing on data to third parties and privacy issues which are related to personal data exposure due to insufficient access control.
By maintaining social networks and actively participating in Social Web activities like interacting with others, users unwittingly expose sensitive and personal or inappropriate, even reputation-damaging data not only to friends but to an audience that mostly remains invisible and consists of strangers or acquaintances that potentially are not supposed to see such information. Thus the revealed information can lead to major consequences if read out of context or read by parties, like authorities or job recruiters, for whom this information was not intended. The reputation of social networking sites has been slightly […]

Leseprobe

Inhaltsverzeichnis


Emily Kigel
Using Social Semantic Web Data for Privacy Policies
ISBN: 978-3-8366-4441-9
Herstellung: Diplomica® Verlag GmbH, Hamburg, 2010
Zugl. Leibniz Universität Hannover, Hannover, Deutschland, Bachelorarbeit, 2009
Dieses Werk ist urheberrechtlich geschützt. Die dadurch begründeten Rechte,
insbesondere die der Übersetzung, des Nachdrucks, des Vortrags, der Entnahme von
Abbildungen und Tabellen, der Funksendung, der Mikroverfilmung oder der
Vervielfältigung auf anderen Wegen und der Speicherung in Datenverarbeitungsanlagen,
bleiben, auch bei nur auszugsweiser Verwertung, vorbehalten. Eine Vervielfältigung
dieses Werkes oder von Teilen dieses Werkes ist auch im Einzelfall nur in den Grenzen
der gesetzlichen Bestimmungen des Urheberrechtsgesetzes der Bundesrepublik
Deutschland in der jeweils geltenden Fassung zulässig. Sie ist grundsätzlich
vergütungspflichtig. Zuwiderhandlungen unterliegen den Strafbestimmungen des
Urheberrechtes.
Die Wiedergabe von Gebrauchsnamen, Handelsnamen, Warenbezeichnungen usw. in
diesem Werk berechtigt auch ohne besondere Kennzeichnung nicht zu der Annahme,
dass solche Namen im Sinne der Warenzeichen- und Markenschutz-Gesetzgebung als frei
zu betrachten wären und daher von jedermann benutzt werden dürften.
Die Informationen in diesem Werk wurden mit Sorgfalt erarbeitet. Dennoch können
Fehler nicht vollständig ausgeschlossen werden und der Verlag, die Autoren oder
Übersetzer übernehmen keine juristische Verantwortung oder irgendeine Haftung für evtl.
verbliebene fehlerhafte Angaben und deren Folgen.
© Diplomica Verlag GmbH
http://www.diplomica.de, Hamburg 2010

Abstract
Social Web applications are steadily gaining popularity. At the same time, the open
nature of such services leads to the exposure of an immense amount of personal data.
Due to insucient access control on nowadays Social Web applications problems in
terms of privacy arise. This thesis focuses on the need for more exible and ne-
grained privacy restrictions. It analyses privacy problems of current Social Web
applications and compares the privacy preferences such applications oer. Based
on this analysis, this thesis extends the well-known principle of policy-based access
control, which is a exible and dynamic way to dene who can get access to what
content based on user preferences. The presented extension accommodates policies
to the requirements of the Social Web. In particular, it describes how to exploit
Social Semantic Web data for privacy reasoning. This includes the retrieval of Social
and Semantic Web data from various information sources on the Web. It further
includes its usage for the denition of privacy policies and its consideration during
the policy evaluation. Consequently, using Social Semantic Web data for policy
reasoning allows users to exactly dene which social relationships and properties
a requester has to have in order to access a particular resource. These conditions
can cross the boundaries of a single Social Web application. Hence, a user can for
example state that a friend on one application can access pictures stored on another
application; thus bridging the walled garden of nowadays Social Web applications.
i

Contents
1 Introduction
1
2 Motivating Scenario and Problem Statement
4
2.1 Motivating Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
3 Background
8
3.1 The Social Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3.1.1 Social Networking Sites . . . . . . . . . . . . . . . . . . . . .
8
3.2 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.2.1 The Resource Description Framework (RDF) . . . . . . . . .
10
3.2.2 The SPARQL Protocol And Query Language . . . . . . . . .
10
3.2.3 SPARQL Endpoints . . . . . . . . . . . . . . . . . . . . . . .
11
3.2.4 The Social Semantic Web . . . . . . . . . . . . . . . . . . . .
11
3.3 Privacy Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
3.3.1 Policy Languages . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.3.2 Policy Frameworks . . . . . . . . . . . . . . . . . . . . . . . .
13
3.3.3 The Protune Framework . . . . . . . . . . . . . . . . . . . . .
14
4 The Social Web from a Privacy Perspective
16
4.1 Data Disclosure - Why Better Control is Needed . . . . . . . . . . .
16
4.1.1 Privacy Issues . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
4.1.2 Information Overload . . . . . . . . . . . . . . . . . . . . . .
18
4.2 Privacy Protection on the Social Web - a State of Art . . . . . . . .
19
4.2.1 Twitter and its Privacy Options . . . . . . . . . . . . . . . . .
19
4.2.2 Facebook and its Privacy Options . . . . . . . . . . . . . . . .
20
4.2.3 Flickr and its Privacy Options . . . . . . . . . . . . . . . . . .
27
4.3 Comparing Privacy Preferences on Social Platforms . . . . . . . . . .
30
4.3.1 Levels of Trust for Data Disclosure . . . . . . . . . . . . . . .
30
4.3.2 Network Features and their Protection . . . . . . . . . . . . .
33
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
ii

5 Policy Reasoning Based on Social and Semantic Web Data
36
5.1 Requirements for Policies on the Social Web . . . . . . . . . . . . . .
36
5.2 Social and Semantic Web Data for Policy Specication and Evaluation 37
5.2.1 Types of Social Data and their Availability . . . . . . . . . .
37
5.2.2 Using Social Data to Dene New Concepts . . . . . . . . . .
39
5.2.3 Enforcing Policies upon an Application . . . . . . . . . . . . .
40
5.3 Taking up the Motivating Scenario . . . . . . . . . . . . . . . . . . .
41
6 Implementation
45
6.1 Retrieving Heterogeneous Information . . . . . . . . . . . . . . . . .
45
6.1.1 Retrieving Social Web data . . . . . . . . . . . . . . . . . . .
45
6.1.2 Retrieving Social Semantic Web Data . . . . . . . . . . . . .
48
6.2 Wrappers for External Information Sources . . . . . . . . . . . . . .
49
6.2.1 IN-Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
6.2.2 SPARQL Endpoint Wrapper . . . . . . . . . . . . . . . . . .
52
6.2.3 DBpedia Wrapper . . . . . . . . . . . . . . . . . . . . . . . .
52
6.2.4 DBLP Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . .
53
6.2.5 RDF Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . .
53
6.2.6 Flickr Wrapper . . . . . . . . . . . . . . . . . . . . . . . . . .
54
6.2.7 Twitter Wrapper . . . . . . . . . . . . . . . . . . . . . . . . .
54
6.3 SPoX- A Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
7 Related Work
58
8 Conclusions and Outlook
60
Bibliography
62
iii

1 Introduction
In the last years the web underwent a drastic shift from a static, centralised infor-
mation system to a dynamic, user-generated, distributed and open platform, and
users changed from passive consumers to active participants, interacting, creating
and sharing content. [1] This `new' web is called Web 2.0. In the era of this move-
ment new Social Web applications emerged creating an environment for people to
publish, share and discuss content, plus enabling people to create descriptive proles
of themselves for self-expression and build social networks consisting of relationships
with others with the purpose of interaction and communication. With the increasing
popularity of such social networking applications the number of users has scaled up
and is still growing. Not only the number of users but also the web trac is an
indicator to the growing importance of social networking platforms which are now
among the most visited websites.
With over 100 million unique visitors worldwide, Facebook is one of the most
popular networking sites on the web
1
, moreover the site ranks third in the top visited
sites on the web only being surpassed by Google and Yahoo! according to Alexa
2
.
YouTube (with over 80 million unique visitors), MySpace (with about 60 million
unique visitors) and Flickr (about 30 million unique visitors)
3
are other examples
of prominent social networking platforms.
However, the availability of such a huge amount of information within the social
networking sites and the open nature of the services and their usage also attracts the
attention of parties with marketing purposes or malicious intent. Users are thereby
put at risk of online stalking, phishing
4
, identity theft, spamming, passing on data
to third parties and privacy issues which are related to personal data exposure due
to insucient access control.
By maintaining social networks and actively participating in Social Web activi-
ties like interacting with others, users unwittingly expose sensitive and personal or
inappropriate, even reputation-damaging data not only to friends but to an audience
that mostly remains invisible and consists of strangers or acquaintances that poten-
tially are not supposed to see such information. Thus the revealed information can
lead to major consequences if read out of context or read by parties, like authorities
or job recruiters, for whom this information was not intended. For example, some
real-world problems based on privacy issues can be read in [2, 3, 4, 5]. These are
just a few examples that received major public attention. The reputation of social
networking sites has been slightly diminished by several such incidents that often
reach the attention of the media. Seeing that social networking users become more
aware of such privacy risks and do not stop from expressing their concerns
5
, the
1
http://siteanalytics.compete.com/facebook.com/; retrieved August 14, 2009
2
A web service which measures web trac of web sites and creates ranking list accordingly;
Alexa trac rankings, retrieved August 14, 2009 from http://www.alexa.com/topsites/global
3
http://siteanalytics.compete.com/youtube.com/, http://siteanalytics.compete.com/myspace.com/,
http://siteanalytics.compete.com/ickr.com/.
4
Phishing is an attempt to get personal data of users by using forged websites.
5
Introduction of the `News Feed' on Facebook, which informed users about the newest activities
of their friends, lead to an online petition, including over 700,000 users, demanding an abolition of
1

privacy settings need to be acknowledged as an important part of Social Web appli-
cations and extensive research needs to be done in the area of improving the privacy
preferences and giving the users control over who can see what of their social data.
Although social networking sites have already realised the need for privacy pro-
tection and some Social Web applications like Facebook have even installed more
complex access restrictions, the privacy preferences are still not ne-grained and
exible enough and far from being satisfying. Further on, such privacy settings con-
ne themselves to the properties of the own website without making use of data
created beyond the boundaries of the own application.
Policy-based access control is an approach to protect privacy in open systems like
the Social Web applications and can further help to control the information overload
which users are facing on the Social Web. With policies being formal, well-dened
statements [7] the process of dening who can get access to what content based
on user preferences can be realized in a exible and dynamic way. Nevertheless,
the current policy-based control of the behaviour of complex systems does not oer
solutions for the movement towards Social Web where information about users, their
content and their relationships is not conned to one application only but is spread
out across the whole Social Web. This is also the problem of privacy settings of
the current Social Web applications which oer preferences only according to the
established relationships and other attributes within the own website.
The contribution of this thesis is therefore to enhance the privacy policies in terms
of integrating data from various information sources such as Social Web applications
into the policy specication and reasoning process. Such policies can save people
the trouble to create the same social data
6
on each Social Web application they
are members of. These privacy policies can be proved to be benecial seeing as
there are numerous social networking sites emerging and oering people various
functionalities and side-specic features and people spend an increasing amount of
time in maintaining all this distributed data throughout the many services.
Social Web applications provide their information in proprietary formats via their
own site-specic application programming interfaces. In this thesis the presented ap-
proach collects this arbitrary, heterogeneous data and provides it in a homogeneous
format so that it can be integrated into policies and exploited for policy reasoning.
Furthermore, in addition to Social Web data, Semantic Web data can be included in
policy decisions as well to extend the variety of policy specication. Such semantic
information is available in non-application-specic standard formats which can be
easily transported and reused. Additionally, information provided by Social Web ap-
plications can be retrieved as Social Semantic data, allowing to convert Social data
into a unique format using Semantic Web technologies. In this thesis the process
of retrieving all these information types, transforming the extracted data into a for-
mat appropriate for the policy-based access control and combining them to create
ne-grained privacy policies will be explained and demonstrated using selected in-
formation sources from the Web. To implement this presented solution the policy
the feature. [6]
6
Social data includes user-generated content (bookmarks, tags, reviews, photos, blog posts etc.),
personal data of users and their social networks.
2

framework Protune is used to automate the evaluation and decision process based
on the conclusion drawn from privacy policies.
The remainder of the thesis is organised as follows. Section 2 presents a scenario
showing how privacy policies can be integrated into a web application and enhance
the user experience on such sites. Further on the problem statement is identied and
described. Section 3 is intended to provide the background information, necessary
for understanding this thesis. Section 4 analyses privacy problems of current Social
Web applications and compares the privacy preferences such applications oer. The
extension of policy-based access control to accommodate policies to the requirements
of the Social Web is presented in Section 5 and the motivating scenario is revisited.
Subsequently, Section 6 describes how the Social and Semantic Web data can be
retrieved in order to include this data into Protune, describing the actual imple-
mentation of the Protune extension for Social Semantic Web data. A prototype
implementation called SPoX [8] is also presented, which demonstrates the usage of
policy-based behaviour control on the Social application Skype. After a presentation
of related work in Section 7, Section 8 concludes this thesis also providing an outlook
for future research.
3

2 Motivating Scenario and Problem Statement
2.1 Motivating Scenario
In order to explain the benets of using a policy-driven approach to protect privacy
in open systems, a ctional scenario will be presented that serves as a use case to
demonstrate how privacy policies based on various Social and Semantic Web data can
be used to control the own data on a Social Web application. Parts of the scenario
will be used throughout this thesis to explain the syntax of policies in general and
Protune policies in particular and how external data can be retrieved and integrated
into the Protune framework.
Bob is a scientist who is working in a company and taking part in research
projects as well as holding a seminar for students at the local university. Bob has
a very active web presence; besides having proles on numerous social networking
sites like Facebook, Flickr and Twitter, he also manages his own website, where he
uploads information about dierent aspects of his life, such as his business, interests
and family-related information. Furthermore, being a supporter of the Semantic Web
movement, he has uploaded a FOAF le to express his network of acquaintances.
(S1) To be easily reached by his colleagues, his students at the university as well as
by his friends, Bob posts his contact information on his website. Because his email
address is not too private he agrees if any of his friends and colleagues can see it.
(S2) Nevertheless some contact data is very sensitive, therefore Bob does not want
anybody to see all his contact information; for example Bob only wants to disclose his
phone number to his family and close-friends, which he added to his `close-friend'-list
on Facebook.
(S3) Bob uses his website for blogging as well. He publishes ndings concerning
some of his researches and includes links to other interesting websites which are
relevant for his profession. He also talks about his personal life, discusses stories
and posts pictures about family and friends as well as about his interests beyond his
profession. As some of the information he posts is private, such as pictures of his last
vacation trip or a wish list he compiled for his next birthday, he only wants his Flickr
and Facebook friends in addition to his family to see it. Moreover everybody who is
in the Flickr group about the landscape of Southern France, can see the pictures of
his holiday trip, which are tagged with the keyword `Southern France' but not the
ones tagged `private', as these are only intended for the family and close friends.
(S4) Any work-specic information Bob posts or les he uploads should only be
visible to his colleagues, belonging to the work network of his company on Facebook.
Additionally Bob wants the co-authors of his publications to also have access to
updates about any research projects.
4

(S5) Bob has many dierent interests and hobbies, which he likes to write about.
One of them is his passion for baseball. Unfortunately not all his friends share the
same hobby, that is why Bob would like to disclose his updates concerning that sport
to people appreciating reading it. Say, if some of Bob's friends are in the group about
baseball on Facebook, they are more likely to prefer reading Bob's thoughts on the
next baseball championship, rather then reading his presentation slides about the
newest Semantic Web technologies, which in return would go down well with his
FOAF friends. That is why Bob wants his friends, who are either also in a baseball
group or people having a blog writing about baseball to be able to see baseball-related
information.
(S6) His presentation slides are only intended for his acquaintances interested in
Semantic Web. Further on, Bob wants to use this presentation slides in the next
lecture he is giving. To get relevant feedback on the quality of these slides, he hopes
his colleagues and friends who are skilled in that eld are going to read the slides and
give their constructive criticism. He knows that all friends he added in his FOAF
prole, his colleagues in his company network on Facebook and any friends, who
are also co-authors of his publications are all skilled enough to help him. As theses
slides are in German language he only wants people to read the slides who master
this language.
2.2 Problem Statement
At present time, the possibilities to dene and enforce privacy preferences are too
restrictive and unadaptable to the user's needs. The internet lacks walls, as danah
boyd pointed out [9], which is why the ability to dene expressive privacy preferences
and improve user's control is essential, especially when it comes to sensitive and
personal data.
To be able to decide which part of the personal data can be disclosed to whom,
each user may have his own ideas on the right privacy preferences according to his
personal situation and purposes on the Social Web; for example a user is searching
for new business contacts, wants to meet new people or wants to keep in touch with
his current friends, among many others. As these situations are quite complex but
also individual for each user it is dicult to provide static and predened privacy
options which are also sucient enough to accommodate each individual need of
thousands of users of a Social Web application. Therefore users need to have privacy
settings going beyond some predened checkboxes with a few selected options which
also are of binary type in most cases, such as a prole is either private or public, a
user is either a friend or a stranger and so on.
Furthermore another shortcoming of the Social Web applications, which needs
to be overcome, is the restriction of the users in dening privacy preferences based
on information within the border of their own application. Each of the Social Web
applications is like an island collecting social data of their users and providing it in
a site-specic, proprietary way. Due to the growing diculty of maintaining such
distributed amount of data and the increasing time people spend on the web to
5

manage their identities and social networks and having to re-enter their information
every time a new website is being used, it can be proved to be benecial to incorporate
social data of any of the otherwise isolated Social Web applications into the privacy
preferences of an application.
Another problem is the perpetual change of the structures of individual social
networks as new relationships are build and already established relationships change
their statuses such as an acquaintance can become a close friend and so on. All
this changes internally but also beyond the borders of one application need to ow
into the privacy preferences without the user having to adjust the settings manually.
Say, if a new colleague wants to access a user's data meant for the employers of his
workplace, the privacy settings should automatically recognise and include her as a
new colleague.
Of course, exibility and big variety goes along with usability challenges; with
privacy settings being dicult to understand and hard to adjust for normal users.
This problem leads to users mostly keeping the default settings as it is already often
the case in nowadays Social Web applications. [10]
Bearing all arising problems with regard to privacy preferences in mind, this
thesis seizes on the need for a ne-grained privacy management and exploits policies
for dening the behaviour of a system based on certain conditions. These policies
need to be expanded in a way that enables a policy-based access control that can be
applied to the happenings of the Social Web so that a scenario as presented in the
previous section can become reality. To achieve this goal several requirements need
to be met:
·
A dynamic and well-dened policy language is needed for specifying the poli-
cies.
·
This language must have the ability to incorporate data from any possible
sources on the Web beyond the border of the own application which also sup-
ports the high level of exibility.
·
These data can either be Social Data or Semantic Web data of any kind,
such as attributes of a user or a group, information about relationships, user-
generated data or activities, general information like publications and their
authors, information about countries, languages and many others.
·
Further qualities policies should have are:
Fine-grained policies: policies need to be detailed enough to be applied
to the complex scenarios on the Web, so that any arbitrary combinations
of a user's needs can be realized.
Dynamic and automatically adopting to changes: Policies should be ad-
justable to changes occurring on the Social Web, such as if a new colleague
requires access, the policies are adapted dynamically to the changes and
therefore recognise the colleague as such and include him into the appro-
priate concepts.
6

Usability: The specication process of policies should be intuitive, simple
and fast. People who are not skilled in the formal syntax of the policy
languages should have no problem to dene policies.
Lucidity: Users should clearly understand what a policy they have created
does without leaving room for interpretation.
When access to a resource is denied an explanation is needed helping the
user to understand why his request failed.
·
To automate the evaluation and decision process of the dened policies a policy
framework is needed. This policy framework automatically queries the respec-
tive information sources and selects the information a user wants to use for his
policies independently of the format in which the data is provided.
·
The extracted information is unied and combined to create policies, meaning
data from dierent sources can be incorporate into one policy.
·
Policies should be enforced upon an application, that is, if a resource which
is protected by such a policy is requested, the framework evaluates the policy
and according to the result either allows or denies access.
·
To be able to correctly reason over a policy the framework needs to catego-
rize the requester according to some provided identication properties. The
requester is added to either the group of people who are allowed to access the
source or the one for whom access is denied.
·
Analogously the information overload, due to the various communication tech-
niques on Social Web applications, needs be controlled with the policies as
well.
7

3 Background
This section is intended to provide background information which is relevant for
understanding the subsequent elaborations in this thesis.
3.1 The Social Web
The Web has undergone a change in terms of the way content is being created and
used as well as the roles of providers (authors) and consumers (readers) which are
not mutually exclusive anymore. The concept describing the shift from the static,
centralised Web to a dynamic, user-generated, open platform is often referred to as
the Web 2.0, coined by Tim O'Reilly as he published the article What is Web 2.0?.
Web 2.0 denes new Web technologies which enable rich user interfaces, provide
services open for others to use, combine various sources and enable user participation
and interactivity. [11]
Some of the most distinct principles of Web 2.0 are the linking of data throughout
the web and providing services that enable the users to not only socialize online
but also publish, share, reuse and generally participate in creating content. This
concept is also often referred to as the Social Web. In the era of this movement
towards interoperability, information sharing and interacting the so-called Social Web
applications have emerged providing an environment where people can link with each
other to create personal social networks of relationships and collaboratively create
content. Some of the most typical application categories that built the Social Web
are wikis, blogs, podcasts, social bookmarking web services, social networking sites
and content sharing sites.
Social networking sites and content sharing sites have both the purpose to built
social networks among the members, be it the primary goal as it is the case with social
networking sites or the secondary objective like with the content sharing site. That
is why these sites are the main focus here, because the aspect of social interaction is
important for this thesis. Further on the content sharing sites will not be separately
presented in this section as their community aspect is similar to the one in the social
networking sites.
3.1.1 Social Networking Sites
This subsection briey introduces social networking sites and the common features,
that are found in most of them. A more detailed description of the social networking
sites' specic functionalities with regard to privacy will be presented in Section 4.
Social networking sites give people the opportunity to maintain their real-world
social connections such as friends, family and colleagues online. Users can also
build new relationships based on common ground; shared interests and activities
or any other aliations like common geographical location or business contacts to
communicate and to expand their online social network. As a result users create
links to other users producing the so-called social graph which represents a web
of connections with direct ties and indirect ties (Friends of Friends). [12] Social
8

networking sites enable their users to create self-portraying proles which can contain
all possible personal information about a user. Users also post self-generated content
about their lives, activities and interests. The purpose of publishing data is mostly to
share and discuss it with other community members. Information a user can provide
about himself in his prole can be divided in three categories according to [13]:
Contact information: such as name, address, e-mail address, telephone and mo-
bile phone number,
Individual information: including relationship status, personality attributes, sex-
ual preferences, physical attributes; birthday, education and occupation,
religious- and political aliation,
Interest information: subjects of interest, hobbies, favourite books, movies, etc.,
and any association aliations.
Furthermore data about users emerges from them being active on the social network-
ing sites, for instance, joining groups or befriending someone.
Social networking sites provide various functionalities to support the social in-
teractions and communication of data; chat, messaging, blogging, discussion groups,
tagging
7
and linking content and writing on one another's `Walls`
8
are the most
common functionalities.
All this published information can be intended for either a selected group of
people like friends, family or for the general consumption. For this purpose one of
the main characteristics on social networking sites is to dene, who of the people
belongs to the trusted set of users, also referred to as friends on the most social
networking sites, and who belongs to the non-trusted set of users, labelled as the
strangers. [14]. The most social networking sites further divide the non-trusted
group in social networking sites members and non- members. The term friend,
which is commonly applied to the trusted group of people represents a consensual
connection between two users and has not necessarily the same meaning as it does
oine as people referred to acquaintances in the real-world are often seen as friends
in the online world. Although on most sites befriending someone is a bi-directional
process, that requires conrmation of both parties to establish a link between the
proles, there are also sites where one-directional ties are common. The access to
information is mostly based on the established status of the user.
3.2 The Semantic Web
Information on the internet is mostly provided either for the humans consumption
or for machines to process. Furthermore data provided by one application may not
be understood by another, making the process of data exchange and integration very
hard. The Semantic Web aims to represent and structure data of the current Web in
7
Tagging is used for organizing and interlinking content. To tag a content means to assign a
descriptive keyword to the content to make the search for this data easier.
8
The term Wall is used in Facebook to describe the reserved space on a user's prole where the
user himself or his friends can write public messages.
9

a way to make it understandable for humans and enable machines to understand the
semantics of the content as well. Therefore data needs to be well-dened, interlinked
and annotated with metadata so that it can be read, understood and processed by
software agents and exchanged across various applications. [15]
The Semantic Web is based on various technologies to make data available in a
semantically structured format, as it is illustrated in the Semantic Web Stack seen
in [16].
3.2.1 The Resource Description Framework (RDF)
One of the most widely adopted open, standard format, used to provide machine-
readable information, is the Resource Description Framework (RDF), which was
developed by the W3C
9
. With RDF Web resources can be described in a common way
so that they can be read and understood by dierent applications. Information about
web pages, personal information of people, among many others can be modelled using
RDF. RDF is based on two technologies; XML
10
and URI
11
, whereby URI is used
to identify resources on the Web and XML is used to exchange information between
dierent systems. [17]
Information in RDF is written in statements like The location of the company
Daimler AG is Germany, that consist of triples of {Subject, Predicate, Object}, for
instance Daimler_AG hasLocation Germany. An RDF statement describes a
- resource
(the
Subject)
identied
by
an
URI,
such
as
http://dbpedia.org/page/Daimler_AG
- the property of a resource (the Predicate) such as `location' and
- the value of a property, which can be another resource such as
http://dbpedia.org/page/Germany or a literal like `true' or `25'. Several
such triples form an RDF graph. [18]
3.2.2 The SPARQL Protocol And Query Language
To query such previously described triple-data provided by an RDF graph the RDF
query language SPARQL (SPARQL Protocol And Query Language) can be used.
The Syntax of SPARQL is similar to SQL using SELECT
12
, FROM and WHERE to
form a query. But as RDF is written in triples the SPARQL syntax is using the same
pattern of {Subject, Predicate, Object} statements as one can see in the following
example query. Here the query returns all companies in the RDF graph with the
location `Germany':
9
the World Wide Web Consortium
10
eXtensible Markup Language
11
Uniform Resource Identier
12
The SPARQL language also enables other query forms; DESCRIBE and ASK, but as they are
of no relevance for this thesis, they will not be explained here.
10

PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX : <http://dbpedia.org/resource/>
SELECT ?company
WHERE {
?company dbo:location :Germany.
}
3.2.3 SPARQL Endpoints
With the Semantic Web evolvement a lot of information is made available on the web
in the RDF-format, allowing third parties to retrieve this data via so-called SPARQL
endpoints. The endpoints are web services that allow to address and query RDF-data
sources using SPARQL. The query results are then presented in an XML-format.
Popular endpoints exist for services like DBpedia and DBLP, both oer large
knowledge bases for external applications to use. DBpedia is a semantic database
which extracts structured information from Wikipedia and provides it in a Semantic
Web conform way, strictly speaking the information is presented in the RDF format.
The DBpedia datasets comprises information from numerous subjects; from dierent
geographical locations via known people like athletes or artist through to various or-
ganisations including companies, educational institutions and sports teams
13
. Using
the dataset queries like People who were born in Berlin before 1900 [19] can be
answered. The DBpedia project also follows the Linked Data principles [20]: all con-
cepts are identied using URI references, allowing the interlinkage to other datasets
like DBLP
14
or Geonames
15
.
DBLP is a bibliographic database based around computer science which col-
lects information about conferences and authors and their publications, papers and
journals among others. The DBLP D2R Server provides all this information in a
Semantic Web format and allows to access the data via an endpoint.
3.2.4 The Social Semantic Web
In the last years Semantic Web technologies are used more frequently to represent
Social Data provided by Social Web applications. These technologies enable to inter-
link dierent datasets from various sources with each other, thereby helping the data
portability movement to create the Web of Data [15]. The resulting Social Semantic
Web data is represented in a reusable, machine-readable, and non-application-specic
standard format. The data becomes more accessible and it can be easily integrated
in other applications and exchanged among them.
16
Semantically-interlinked Social data needs representation mechanisms to model
specic social information; the user and his social network and the user's generated
content. The FOAF vocabulary is used to describe people and their connections
13
the entire ontology of DBpedia is available at: http://www4.wiwiss.fu-berlin.de/dbpedia/dev/ontology.htm,
visited on September 22nd 2009
14
http://dblp.uni-trier.de/
15
www.geonames.org/
16
see data portability project for further reading; http://www.dataportability.org/
11

Details

Seiten
Erscheinungsform
Originalausgabe
Jahr
2009
ISBN (eBook)
9783836644419
DOI
10.3239/9783836644419
Dateigröße
1015 KB
Sprache
Englisch
Institution / Hochschule
Gottfried Wilhelm Leibniz Universität Hannover – Informatik, Studiengang Informatik
Erscheinungsdatum
2010 (März)
Note
1,7
Schlagworte
netzwerke privacy policies social semantic
Zurück

Titel: Using Social Semantic Web Data for Privacy Policies
Cookie-Einstellungen