Lade Inhalt...

Semi-automatic ontology engineering and ontology supported document indexing in a multilingual environment

©2003 Diplomarbeit 130 Seiten

Zusammenfassung

Inhaltsangabe:Introduction:
The management of large amounts of information and knowledge is of ever increasing importance in today’s large organisations. With the ongoing ease of supplying information online, especially in corporate intranets and knowledge bases, finding the right information becomes an increasingly difficult task. Today’s search tools perform rather poorly in the sense that information access is mostly based on keyword searching or even mere browsing of topic areas. This unfocused approach often leads to undesired results. The following example illustrates the problem more clearly: An agriculture scientist would like to find out which organisation established the Agreement on Agriculture. A simple search for „establish Agreement on Agriculture” might result in a huge list of documents containing these words, but actually none of them containing the desired result: WTO or World Trade Organisation. The problem becomes even worse if the result searched for only appears in a foreign language document.
Semantically annotated documents, i.e. documents that are indexed with ontological terms and concepts instead of simple keywords, provide several advantages. First, the ontological abstraction provides robustness against changes in the document. In the above example, the document representation might change using the term ‘Agricultural Agreement’ instead of ‘Agreement on Agriculture’. However, since the document has been annotated with the ontological semantics, this will not affect the search results. Second, since the ontology used for annotating the document in this example is domain-specific, the semantic meanings and interpretations of keywords are bound to that domain and therefore the retrieval is likely to be more efficient. A term can have several meanings in different domains. By first mapping the keyword to its semantic representation in a specific ontology and using the ontology’s linked knowledge structure, a much more focused search approach can be taken. Third, document specific representations no longer affect the search. This is extremely important in the case of multilingual representations. Keywords of several languages are mapped to the same concept in an ontology and are therefore given the same meaning. Multilingual search portals can be established to produce the same results, no matter which language is used for retrieval.
An important task in knowledge management facilitating above described search scenario id […]

Leseprobe

Inhaltsverzeichnis


ID 6905
Lauser, Boris: Semi-automatic ontology engineering and ontology supported document
indexing in a multilingual environment
Hamburg: Diplomica GmbH, 2003
Zugl.: Fachhochschule Südwestfalen, Technische Universität, Diplomarbeit, 2003
Dieses Werk ist urheberrechtlich geschützt. Die dadurch begründeten Rechte,
insbesondere die der Übersetzung, des Nachdrucks, des Vortrags, der Entnahme von
Abbildungen und Tabellen, der Funksendung, der Mikroverfilmung oder der
Vervielfältigung auf anderen Wegen und der Speicherung in Datenverarbeitungsanlagen,
bleiben, auch bei nur auszugsweiser Verwertung, vorbehalten. Eine Vervielfältigung
dieses Werkes oder von Teilen dieses Werkes ist auch im Einzelfall nur in den Grenzen
der gesetzlichen Bestimmungen des Urheberrechtsgesetzes der Bundesrepublik
Deutschland in der jeweils geltenden Fassung zulässig. Sie ist grundsätzlich
vergütungspflichtig. Zuwiderhandlungen unterliegen den Strafbestimmungen des
Urheberrechtes.
Die Wiedergabe von Gebrauchsnamen, Handelsnamen, Warenbezeichnungen usw. in
diesem Werk berechtigt auch ohne besondere Kennzeichnung nicht zu der Annahme,
dass solche Namen im Sinne der Warenzeichen- und Markenschutz-Gesetzgebung als frei
zu betrachten wären und daher von jedermann benutzt werden dürften.
Die Informationen in diesem Werk wurden mit Sorgfalt erarbeitet. Dennoch können
Fehler nicht vollständig ausgeschlossen werden, und die Diplomarbeiten Agentur, die
Autoren oder Übersetzer übernehmen keine juristische Verantwortung oder irgendeine
Haftung für evtl. verbliebene fehlerhafte Angaben und deren Folgen.
Diplomica GmbH
http://www.diplom.de, Hamburg 2003
Printed in Germany

I
TABLE OF CONTENTS
1
INTRODUCTION... 1
1.1
M
OTIVATION
... 1
1.2
A
PPROACH
... 3
1.3
O
UTLINE
... 4
2
THE PROJECT ENVIRONMENT... 5
2.1
FAO
AND THE
AOS ... 5
2.2
I
NFORMATION MANAGEMENT AT THE
FAO... 7
2.2.1
Resources and metadata ... 7
2.2.2
The information management system ... 8
2.2.3
AGROVOC Thesaurus and Document Indexing ... 10
2.3
P
ROBLEMS WITH THE CURRENT SYSTEM AND PROPOSAL
... 13
3
SEMANTIC WEB... 15
3.1
T
HE IDEA
... 15
3.2
O
NTOLOGIES
... 17
3.2.1
Introduction ... 17
3.2.2
Types of ontologies... 20
3.2.3
Ontology representation languages... 22
3.2.4
KAON ... 25
3.2.5
Ontology Engineering ... 27
4
INTRODUCTION OF ONTOLOGY BASED INFORMATION
MANAGEMENT SYSTEM AT THE FAO ... 29
4.1
T
HE PROTOTYPE PROJECT
... 29
4.2
R
EQUIREMENTS REGARDING THE
AOS ... 30
4.3
O
NTOLOGY
E
NGINEERING
F
RAMEWORK
... 32
4.3.1
Overview... 32
4.3.2
Initialisation of the cycle... 33
4.3.3
The 5 phases of the framework ... 35
4.4
T
HE
O
NTOLOGY
B
ROWSER
... 40
4.5
R
EPRESENTATION OF
AGROVOC
IN
KAON... 42
4.6
R
ELATED
W
ORK AND POSITIONING
:... 46
4.7
C
URRENT STATUS AND
F
URTHER
W
ORK
:... 48
5
THE ONTOLOGY PRUNER ... 50
5.1
I
NTRODUCTION TO THE PRUNING APPROACH
... 50
5.2
A
DAPTATION OF THE ONTOLOGY PRUNER
... 53
5.3
E
VALUATION
... 56
5.3.1
Resources: Document corpus and source ontology ... 56
5.3.2
Hypotheses for evaluation... 58
5.3.3
Evaluation plan:... 59
5.4
R
ESULTS AND
D
ISCUSSION
: ... 60
5.4.1
Pruner Trie vs. Pruner:... 61
5.4.2
Dependency of the statistics on different parameter settings: ... 61
5.4.3
Generic Document Set 1 (Gen) vs. Generic Document Set 2 (AG): ... 62
5.4.4
Empirical evaluation:... 63
5.5
S
UMMARY
... 67
6
AUTOMATIC CLASSIFICATION ... 69
6.1
I
NTRODUCTION
... 69
6.1.1
What is text categorisation?... 69
6.1.2
Motivation within the project context ... 69
6.2
B
ASIC DEFINITIONS
... 70

II
6.2.1
Using Support Vector Machines for Multi-label Document Indexing ... 70
6.2.2
Evaluation measures:... 74
6.3
A
DAPTATION OF THE CLASSIFIER
... 78
6.3.1
Multi-label vs. single-label Indexing ... 78
6.3.2
Multiple Languages... 80
6.3.3
Integration of background knowledge... 80
6.3.4
Multi-class problem and class hierarchy ... 83
6.4
S
ET OF TRAINING AND TEST DOCUMENTS
... 85
6.5
E
VALUATION
... 89
6.5.1
Single-label vs. multi-label classification... 89
6.5.2
Multilingual classification ... 96
6.5.3
Integration of domain specific background knowledge ... 98
6.6
R
ELATED
W
ORK
... 100
6.7
S
UMMARY AND
O
UTLOOK
... 101
7
CONCLUSION ... 103
7.1
S
UMMARY
... 103
7.2
O
UTLOOK
... 105
REFERENCES... 106
A KAON RDFS REPRESENTATION OF THE ONTOLOGY ON FOOD
SAFETY, ANIMAL AND PLANT HEALTH (EXTRACT)... 113
B COMPLETE LIST OF WEB SITES OUTPUT BY THE FOCUSED
CRAWLER... 114
C AGROVOC
CATEGORIES ... 119
D RESULTS OF ONTOLOGY INTEGRATION INTO AUTOMATIC TEXT
CLASSIFICATION... 123

III
T
ABLE OF
F
IGURES
F
IGURE
1: O
NTOLOGY EXAMPLE
,
EXCERPT
... 2
F
IGURE
2: I
NFORMATION MANAGEMENT SYSTEM AT THE
FAO ... 10
F
IGURE
3: AGROVOC
THESAURUS
: A
SAMPLE EXTRACT SHOWING A DESCRIPTOR AND A NON
-
DESCRIPTOR
... 12
F
IGURE
4: XML
SERIALISATION OF
RDF,
EXAMPLE
... 16
F
IGURE
5: O
NTOLOGY TYPES
... 21
F
IGURE
6: O
NTOLOGY REPRESENTATION LANGUAGES AND THEIR EXPRESSIVENESS TAKEN FROM
[CG00]... 22
F
IGURE
7: RDF S
CHEMA EXAMPLE MODEL
... 23
F
IGURE
8: L
EXICAL
OIM
ODEL
... 25
F
IGURE
9: S
PANNING
O
BJECT
E
XAMPLE
... 26
F
IGURE
10: T
HE ONTOLOGY ENGINEERING FRAMEWORK
... 33
F
IGURE
11: T
HE
F
OCUSED
W
EB
C
RAWLER
... 36
F
IGURE
12: E
VALUATION OF THE ONTOLOGY
... 39
F
IGURE
13: C
OMMUNICATION BETWEEN THE
CDS
SYSTEM AND THE ONTOLOGY BROWSING INTERFACE
... 40
F
IGURE
14: S
CREENSHOT OF THE ADAPTED
KAON
PORTAL
... 41
F
IGURE
15: M
APPING OF
AGROVOC
THESAURUS TO ONTOLOGY STRUCTURE
... 45
F
IGURE
16: M
ODELLING OF
AGROVOC
CATEGORIES
... 46
F
IGURE
17: T
HE ONTOLOGY PRUNING PROBLEM
... 51
F
IGURE
18: P
RUNING PROCESS
­
OLD VS
.
NEWLY ADAPTED VERSION
... 54
F
IGURE
19: F
REQUENCY
P
ROPAGATION
­
FREQUENT CONCEPT WITH INFREQUENT SUPER CONCEPT
... 55
F
IGURE
20: P
RUNER VS
. P
RUNER
T
RIE
,
EVALUATION RESULTS
... 60
F
IGURE
21: D
EPENDENCY OF ALL STATISTICAL ONTOLOGY PARAMETERS ON VARIATION OF THE RATIO PARAMETER
(
EXEMPLARY FOR THE SETTING
TFIDF ALL G
EN WITH
O
NTOLOGY
P
RUNER
T
RIE
) ... 62
F
IGURE
22: D
IFFERENCES IN SIZE BETWEEN LARGEST PRUNED ONTOLOGY AND ALL OTHERS
(P
RUNER
T
RIE
) ... 65
F
IGURE
23: N
UMBER OF DOMAIN SPECIFIC CONCEPTS
,
WHICH HAVE NOT BEEN IDENTIFIED BY THE AUTOMATIC
ONTOLOGY PRUNER
... 66
F
IGURE
24: E
XAMPLE MICRO
-
AVERAGING VS
.
MACRO
-
AVERAGING
... 77
F
IGURE
25: D
EVELOPMENT OF PRECISION
,
RECALL AND BREAKEVEN FOR TEST SET
X
MULTI
_
EN
_D
ESC
... 92
F
IGURE
26: P
RECISION VS
. R
ECALL FOR TEST SET
X
MULTI
_
EN
_D
ESC
... 92
F
IGURE
27: S
INGLE
-
LABEL VS
.
MULTI
-
LABEL CLASSIFICATION
: C
OMPARISON OF OVERALL PERFORMANCE
... 96
F
IGURE
28: O
NTOLOGY INTEGRATION VS
.
NO INTEGRATION OF BACKGROUND KNOWLEDGE
, X
SINGLE
_
EN
_D
ESC
... 99
F
IGURE
29: I
NFLUENCE OF THE DIFFERENT MODES OF ONTOLOGY INTEGRATION ON THE OVERALL PERFORMANCE
(
EACH SERIES CORRESPONDS TO A SPECIFIC NUMBER OF TRAINING EXAMPLES PER CLASS
,
STARTING AT
5)100

IV
L
IST OF
T
ABLES
T
ABLE
1: AGROVOC
ONTOLOGY STATISTICS
... 57
T
ABLE
2: O
NTOLOGY
P
RUNER OUTPUT VS
.
SUBJECT ASSESSMENT OF THIS OUTPUT
... 64
T
ABLE
3: S
PECIFICATION CORRECTNESS AND SPECIFICATION RECALL FOR AUTOMATICALLY PRUNED ONTOLOGIES
... 67
T
ABLE
4: C
ONTINGENCY TABLE FOR DOCUMENT
i
x
... 75
T
ABLE
5: C
ONTINGENCY TABLE FOR CLASS
i
c
... 76
T
ABLE
6: G
LOBAL CONTINGENCY TABLE
... 76
T
ABLE
7: R
AW TEST DOCUMENT SET FOR AUTOMATIC TEXT CLASSIFICATION
, X
RAW
... 86
T
ABLE
8: C
OMPILED TEST DOCUMENT SET
X
MULTI
(
MULTI
-
LABEL
)... 87
T
ABLE
9: C
OMPILED TEST DOCUMENT SET
X
SINGLE
(
SINGLE
-
LABEL
) ... 88
T
ABLE
10: O
VERVIEW ABOUT THE CLASSES OF THE TEST DOCUMENT SETS
... 89
T
ABLE
11: S
INGLE
-
LABEL CLASSIFICATION ON
E
NGLISH DOCUMENTS SETS
;
WORD PRUNING THRESHOLD VS
.
VARIATION OF TRAINING EXAMPLES PER CLASS
;
AVERAGE PRECISION OVER
15
TEST RUNS FOR EACH
CONFIGURATION
... 90
T
ABLE
12: P
ERFORMANCE OF MULTI
-
LABEL CLASSIFICATION WITH
E
NGLISH DOCUMENT SET
X
MULTI
_
EN
_D
ESC
,
AVERAGE PERFORMANCE MEASURES OVER
30
TEST RUNS
... 91
T
ABLE
13: P
ERFORMANCE OF MULTI
-
LABEL CLASSIFICATION WITH
E
NGLISH DOCUMENT SET
X
MULTI
_
EN
_C
AT
,
AVERAGE PERFORMANCE MEASURES OVER
15
TEST RUNS
... 93
T
ABLE
14: P
ERFORMANCE OF MULTI
-
LABEL CLASSIFICATION WITH
S
PANISH DOCUMENT SET
X
MULTI
_
FR
_D
ESC
,
AVERAGE PERFORMANCE MEASURES OVER
30
TEST RUNS
... 94
T
ABLE
15: P
ERFORMANCE OF MULTI
-
LABEL CLASSIFICATION WITH
S
PANISH DOCUMENT SET
X
MULTI
_
ES
_D
ESC
,
AVERAGE PERFORMANCE MEASURES OVER
30
TEST RUNS
... 95
T
ABLE
17: A
VERAGE PRECISION RESULTS OF SIMPLE LANGUAGE CLASSIFIER
... 97
T
ABLE
18: A
VERAGE PRECISION OF SINGLE LABEL TEST RUNS IN ALL
3
LANGUAGES
... 97
T
ABLE
19: P
ERFORMANCE OF
X
SINGLE
_
EN
_D
ESC
WITH ONTOLOGY BACKGROUND KNOWLEDGE
,
AVERAGED PRECISION
OVER
30
RUNS
... 98
T
ABLE
20: P
ERFORMANCE OF
X
SINGLE
_
EN
_C
AT
WITH ONTOLOGY BACKGROUND KNOWLEDGE
,
AVERAGED PRECISION
OVER
30
RUNS
... 123
T
ABLE
21: P
ERFORMANCE OF
X
SINGLE
_
FR
_C
AT
WITH ONTOLOGY BACKGROUND KNOWLEDGE
,
AVERAGED PRECISION
OVER
30
RUNS
... 123
T
ABLE
22: P
ERFORMANCE OF
X
SINGLE
_
ES
_C
AT
WITH ONTOLOGY BACKGROUND KNOWLEDGE
,
AVERAGED PRECISION
OVER
15
RUNS
... 123

1
1
Introduction
1.1
Motivation
The management of large amounts of information and knowledge is of ever increasing
importance in today's large organisations. With the ongoing ease of supplying information
online, especially in corporate intranets and knowledge bases, finding the right information
becomes an increasingly difficult task. Today's search tools perform rather poorly in the sense
that information access is mostly based on keyword searching or even mere browsing of topic
areas. This unfocused approach often leads to undesired results. The following example
illustrates the problem more clearly:
An agriculture scientist would like to find out which organisation established
the Agreement on Agriculture. A simple search for "establish Agreement on
Agriculture" might result in a huge list of documents containing these words, but
actually none of them containing the desired result: WTO or World Trade
Organisation. The problem becomes even worse if the result searched for only
appears in a foreign language document.
Figure 1 shows an extract of an ontology, which could solve this problem by
following links in a graph. The grey ellipses represent generic concepts, whereas
the white ones represent specific instances of these concepts. The two concepts
shown here are linked by a relationship. An ontology-enabled search application
would first identify "Agreement on Agriculture" as a "standard" and would then
detect the relationship "establish" to "international organisation" and its instances,
and hence solve the problem by extending the search query. This example shows
how ontologies can help to improve the management of information. Furthermore,
it could provide added value by detecting other relationships that provide the user
with more possibilities: for example, standards of other organisations could be
presented.
Semantically annotated documents, i.e. documents that are indexed with ontological terms
and concepts instead of simple keywords, provide several advantages. First, the ontological
abstraction provides robustness against changes in the document. In the above example, the
document representation might change using the term `Agricultural Agreement' instead of
`Agreement on Agriculture'. However, since the document has been annotated with the
ontological semantics, this will not affect the search results. Second, since the ontology used

2
for annotating the document in this example is domain-specific, the semantic meanings and
interpretations of keywords are bound to that domain and therefore the retrieval is likely to be
more efficient. A term can have several meanings in different domains. By first mapping the
keyword to its semantic representation in a specific ontology and using the ontology's linked
knowledge structure, a much more focused search approach can be taken. Third, document
specific representations no longer affect the search. This is extremely important in the case of
multilingual representations. Keywords of several languages are mapped to the same concept
in an ontology and are therefore given the same meaning. Multilingual search portals can be
established to produce the same results, no matter which language is used for retrieval.
Figure 1: Ontology example, excerpt
An important task in knowledge management facilitating above described search scenario
id the classification and indexing of documents. At present, subject specialists are responsible
for this time consuming process. However, with today's vast amount of available information
on the WWW, automatic support is needed to efficiently manage this task. Ontologies play a
critical role in supporting the machine readable semantics needed to facilitate automation.
They can be used for providing the categories and keywords needed to describe the content of
documents. Automatic text classification tools still lack the necessary precision to replace
human indexers and need to be extensively evaluated in different domains.
Before such powerful Semantic Web
1
applications can be built and used within certain
domains of knowledge, the basic requirement - a machine readable vocabulary represented by
a domain ontology - has to be established. The creation of ontologies is a time consuming task
and often carried out in an ad-hoc manner. Only few methodologies exist and existing ones
are often extremely complex and need extensive training and expertise. Even less automated
tool support is available. Constituting the knowledge base for future Semantic Web
applications, domain ontologies have to be created continuously in all possible areas and
communities. The need for a reusable methodology is evident.
1
Refer to [Pal01] for a short introduction to the Semantic Web.

3
1.2
Approach
The thesis introduces a comprehensive framework for building a domain-specific ontology.
The approach combines classical methodologies for human-based ontology engineering with
semiautomatic support of a heuristic toolkit. Two methods for ontology acquisition are
applied in order to create the domain ontology. The first is to create a small, domain-specific
core ontology from scratch. This step is supported by automatically extracting interesting
concepts from a corpus of domain texts, which can be used to extend this base ontology. The
second acquisition approach takes a well-established thesaurus as a basic vocabulary
reference set, and converts it into an ontology representation. Then, a domain specific and a
general corpus of texts are used to remove ontology concepts that are not descriptive for the
domain from this converted representation. The rational used here is that domain specific
concepts are more frequent in the domain-specific text corpus. The results of these steps are
assessed to assemble a first version of the domain specific ontology. This ontology is then
accessible through a multilingual web portal to be incorporated into other applications, such
as document indexing or keyword searching of indexed documents. It could eventually be
used to automatically index documents available through this kind of search application.
Carried out in collaboration with the Food and Agriculture Organisation (FAO)
2
of the
United Nations (UN), the main focus of this thesis is on the adoption of the proposed
framework to the specific environment and needs of this large organisation. The framework
has been applied to create a prototype biosecurity ontology for the domain of Food Safety,
Animal and Plant Health to be incorporated into an Internet Portal to this domain. Within this
context, the conversion of a thesaurus into an ontology and evaluations of two automatic tools
especially, constitute the central parts of the academic research work. The first evaluation is
on a tree-pruning algorithm used in the ontology creation process to retrieve domain specific
concepts from the converted thesaurus. The second evaluation is on a text classification
application based on support vector machines, enhanced by a domain specific ontology
serving as background knowledge for the classification algorithm.
2
[
http://www.fao.org
].

4
1.3
Outline
The next section gives an introduction and overview about the Food and Agriculture
Organisation, and the Agricultural Ontology Service (AOS) Project, which provides the
bigger context in which the research work of this thesis is embedded. The current information
management structure will be introduced briefly, outlining the overall current status and
problems within the organisation.
In section 3, I will give an introduction to the idea of the Semantic Web as well as to
ontologies and their various representations and engineering approaches. The comprehensive
framework for the creation of a multilingual domain ontology is covered in section 4. The
application of the framework will be described in the context of the above-mentioned project
to establish an International Portal on Food Safety, Animal and Plant Health. The conversion
of an existing thesaurus into an ontology representation as well as the adaptation of a
multilingual ontology web browser to be embedded into the system is discussed here in detail.
Sections 5 and 6 describe in detail the adaptation and evaluation of two automatic tools
constituting parts of the framework. Section 5 describes the thesaurus pruning algorithm used
within the ontology creation framework and discusses the results of an empirical evaluation
carried out within the context of the project. Section 6 introduces the reader to the area of
automatic text classification and describes the adaptation of an already existent automatic text
classifier based on support vector machines to incorporate domain specific ontologies. Several
evaluation results are discussed against the question of the applicability of the classifier in the
context of the FAO and against results of earlier evaluations. Finally, section 7 summarises
the findings and results and provides an overview on future work.

5
2
The project environment
2.1
FAO and the AOS
The Food and Agriculture Organisation (FAO) of the United Nations (UN) was founded in
1945 with a mandate to raise levels of nutrition and standards of living, to improve
agricultural productivity, and to better the condition of rural populations. Today, FAO is one
of the largest specialised agencies in the United Nations system and the lead agency for
agriculture, forestry, fisheries and rural development. As an intergovernmental organisation,
FAO has 183 member countries plus one member organisation, the European Community.
Considering the scope of the organisation, knowledge management is vital for effective
decision-making. One of the FAO's visions within the context of its strategic framework is to
be a centre of excellence and an authoritative purveyor of knowledge and advice in the sphere
of its mandate. FAO has a mandate to collect, analyse, interpret and disseminate information
relating to nutrition, food, agriculture, forestry and fisheries. The Organisation serves as a
clearing-house, providing farmers, scientists, government planners, traders and non-
governmental organisations with the information they need to make rational decisions on
planning, investment, marketing, research and training.
The World Agricultural Information Centre (WAICENT)
3
is FAO's strategic inter-
departmental programme on information management and dissemination. WAICENT
provides a corporate information platform for the acquisition, updating and dissemination of
FAO information.
There is no doubt that the Web provides a potential platform for global access to this
information, but it was not initially envisioned as a tool for global access to information, and
the underlying standards for information management are not entirely adequate. By the very
nature of the Internet's architecture, information on similar subjects is scattered across many
different servers around the world, yet there are few tools to integrate related information
from different sources. As a result, it is often very difficult to find things on the Web. This is
equally evident in FAO's information system and will be described further in the next section,
when the structure of this system will be introduced.
Such problems can only be solved if action is taken to establish appropriate norms,
vocabularies, guidelines and standards to facilitate the integration of data from different
sources, and to engage in effective data exchange. Through the adoption of international
3
[
http://www.fao.org/waicent/index_en.asp
].

6
classification schemes, controlled vocabularies, open standards, and common data models we
will eventually overcome many of the information management problems of the Internet;
through the development of tools that exploit such standards it will ultimately be possible to
provide an effective framework for "one-stop shopping", where people can search for
agricultural information resources in one place, without having to explore many different
individual web sites.
In the agricultural sector there exist already many well-established and authoritative
controlled vocabularies, such as FAO's AGROVOC Multilingual Thesaurus, the CABI
Thesaurus
4
, and AgNIC
5
, the thesaurus of the National Agricultural Library in the United
States. Ontology is a new concept extending the traditional thesaurus approach by structuring
the concepts more formally and providing richer relationships among those. By more formally
structuring the context and meaning of terms, ontologies become an integral part of the
Semantic Web, described by Tim Berners-Lee in [BHL01] as "an extension of the current
Web in which information is given well-defined meaning, better enabling computers and
people to work in co-operation".
In response to such a new approach to managing vocabularies, WAICENT has recently
issued a Concept Note ([AOS01]) for the development of an Agricultural Ontology Service
(AOS). The AOS project will function as a tool to help structure and standardise agricultural
terminology in multiple languages for use by any number of different systems around the
world. The main objectives of the AOS are to provide a framework for:
Better indexing of resources;
Better retrieval of resources; and
Increased interaction within the agricultural community.
With respect to the Semantic Web initiative, the AOS would strive to:
Increase the efficiency and consistency with which multilingual agricultural
resources are described and associated together;
Increase functionality and relevance in accessing these resources; and
Provide a framework for sharing common descriptions, definitions and
relations within the agricultural community.
4
The CABI Thesaurus is a thesaurus of the applied life sciences and the world's largest for agricultural
sciences and related subjects, currently available at [http://194.203.77.66/, Dec2002].
5
Available at [
http://www.agnic.org/
, Dec 2002].

7
Once constructed, the Agricultural Ontology Service will offer a contextually rich and modern
framework for modelling, serving, and managing agricultural terminology. When integrated
with Web-based search tools, it will facilitate resource retrieval, not only providing access to
the specific documents that a particular individual is looking for, but also offering suggestions
for other related resources that are potentially relevant to the topic of interest. As an integral
part of WAICENT, the AOS will pay a strategic role in FAO's effort to fight hunger with
information.
The research work of this thesis is carried out in the context of the AOS project and creates
a first step towards its main objectives. By providing a comprehensive framework for
semiautomatic creation of multilingual domain ontologies and showing first results of
embedding them into an automatic text classifier, the thesis acts as a feasibility study to
achieve the overall objective of integration of information across all agriculture domains.
2.2
Information management at the FAO
The FAO stores and manages a vast amount of data across all agriculture domains.
Information at FAO is stored and made available at two different levels: FAO-wide as well as
in the respective departments. Currently, there is no single access point through which all
information resources are accessible and the various information resources are scattered
across the different systems and departments. Hence, different storage bases have to be
accessed in order to find the necessary information. The following gives a rough overview
about the current system:
2.2.1 Resources and metadata
FAO manages different types of resources. Resource in this context means a piece of
information or an information item in digital, print or any other media format. Mainly the
following resources are made available through the various FAO information systems, though
the resources themselves are not necessarily electronically available.:
Monographs (Books, Newspapers, Journals...)
Analyticals (single articles)
WebPages
Photos and multimedia items
Press releases
Publications (printed and not changeable resources)

8
FAO provides electronic access by describing all resources with metadata, which is
basically data about other data. The Agricultural Metadata Element Set Project (AgMES)
6
extends the proposed elements of the Dublin Core Metadata Initiative (DCMI)
7
to provide a
resource description element set for FAO's agricultural resources. Elements such as title,
author or subject can describe a resource. The full set of elements can be seen at the
respective web sites. The subject element of a resource description set captures the content of
a resource with some representative keywords and is considered the most delicate and
difficult to create element, since it is basically responsible for discovery of the resource in the
system. The work presented throughout the remainder of this thesis basically all deals with
this metadata element. Metadata in the FAO is stored in various databases and made
accessible to the users through different access points. The following paragraph will give an
overview on that system.
2.2.2 The information management system
Currently the FAO basically stores documents and metadata using two different systems:
EIMS (Electronic Information Management System)
FAO Document Online Catalogue (FAODOC)
8
The EIMS (Electronic Information Management System) collects and manages
metadata and keywords linked to any electronic information object, such as publications, web
pages, images or videos, produced by every Department. It stores this metadata in different
databases:
FAO Corporate Document Repository (FAO DocRep)
Website database
Multimedia database
The FAO Corporate Document Repository (DocRep) houses FAO documents and
publications, as well as selected non-FAO publications, in electronic format. The other
databases house their respective information. This electronically available information can be
accessed through
FAO Information Finder
9
6
[
http://www.fao.org/agris/agMES/default.htm
].
7
[
http://dublincore.org/
].

9
FAO Document Repository web interface
While the latter only queries the Document Repository, the Information Finder queries the
whole EIMS system.
FAODOC contains metadata about analytical (articles) and monographic records (books,
serial titles). Large parts are therefore not electronically available. The FAODOC Database
stores metadata about all these items. Currently the FAO Online catalogue can only be
queried through the Online Catalogue Interface. If a document is available in electronic
format (in the FAO Document Repository), a link to that document into the Document
Repository is provided. No further integration with the EIMS system exists so far.
Whereas the metadata information stored in FAODOC has been created and maintained by
a rather small, well-trained group of people over a long time period, metadata in the EIMS is
edited by a bigger, less trained group of people, which might lead to less consistent records.
Figure 2 shows an overview of the current system information flow. It shows the different
interfaces through which publishers can populate the system with metadata, the information
flow between the systems and the interfaces through which users can access the information.
In addition to these FAO wide cross-domain systems, each department maintains its own
department web site hosting information not necessarily retrievable through the FAO wide
information management system. The hosting of information on these sites is therefore rather
uncontrolled and the amount of available data and also the speed with which it is produced in
the various areas forbids it to keep track of all of it in the centralised system.
Moreover, as opposed to the FAO wide systems, information in the various departments is
not necessarily described using metadata. Besides lack of time and human resources, this fact
also arises from the lack of domain specific vocabularies needed to describe the subject of the
resources. The next section will give a more detailed introduction into controlled vocabularies
and their use in subject indexing of resources.
8
[
http://www4.fao.org/faobib/index.html
].
9
[
http://www.fao.org/waicent/search/default.asp
].

10
Figure 2: Information management system at the FAO
2.2.3 AGROVOC Thesaurus and Document Indexing
Subject indexing is the act of describing a document in terms of its subject content. The
purpose of subject indexing is to make it possible to retrieve easily references on a particular
subject. It is the process of extracting the main concepts of a document, representing those
concepts by keywords in the chosen language and associating these keywords with the
document. In order to be unambiguous and carry out this process in a more standardised way,
keywords should be chosen from a controlled vocabulary. The subject element of the
metadata element set, as described before, contains such keywords to describe a resource.
AGROVOC
10
is a multilingual agricultural thesaurus designed to improve information
indexing and retrieval through the use of a controlled vocabulary in the agriculture domain. It
was developed by FAO and the European Community (EC). The Third Edition was published
in 1996, and a supplement followed. It exists in the 5 official FAO languages English, French,
10
The full AGROVOC is available online at [
http://www.fao.org/agrovoc/
].

11
Spanish, Arabic and Chinese and has been translated into further languages, such as
Portuguese, Thai and others. Other versions of AGROVOC have been prepared and are being
maintained by national centres or by groups of countries sharing those languages. It is a
controlled vocabulary designed to describe information resources in the fields of agriculture,
forestry, fisheries, food and related domains (such as environmental terms).
The main role of a thesaurus is to standardise the indexing process through a controlled
vocabulary. For example, it informs users and indexers that systems using AGROVOC use
the term INSECTICIDES to subject index records that pertain to this concept instead of
LARVICIDES or APHICIDES.
The vocabulary of the AGROVOC consists of a collection of keywords, which are
descriptors or non-descriptors. A descriptor is a preferred term/keyword to index a document,
whereas a non-descriptor is a non-preferred term, to be replaced by its associated descriptor(s)
for indexing purposes. In the above example, INSECTICIDES is a descriptor and the other
two keywords represent non-descriptors. Only descriptors should be used for indexing
purposes. In the current version of the AGROVOC, there are 16607 descriptors and 10760
non-descriptors. Descriptors are arranged in a broader term ­ narrower term taxonomic
hierarchy structure, i.e. for each descriptor there might be several more special as well as
more general terms. Other relationships linking the keywords are:
Related term, expressing some kind of relationship between this keyword
and another.
Use, declaring the keyword to be a non-descriptor to use another keyword for
indexing purposes.
Used for, showing that this keyword is used as a descriptor for another
keyword.
Used for+, expressing that this descriptor has to be used in conjunction with
another descriptor to replace the linked non-descriptor.
Moreover, each keyword is translated into the different languages. Figure 3 shows a
descriptor and a non-descriptor with its hierarchy structure and relationships in the current
version of the AGROVOC.

12
The keywords of the AGROVOC are, furthermore, mapped to a collection of 116 subject
categories. These categories are used besides the keywords for indexing purposes. A full
listing of all AGROVOC categories is attached in Appendix C.
Figure 3: AGROVOC thesaurus: A sample extract showing a descriptor and a non-descriptor
The kinds of relationships used in this thesaurus certainly limit its expressiveness. The
relationship `related term' for example does not say anything about the kind of relationship.
The terms could be related in any possible way. This lack of expressiveness will be examined
further in section 3 when ontologies are introduced. Moreover, there are some multilingual
issues and modelling restrictions that cannot be addressed using the limited thesaurus
structure. AGROVOC has been translated into different languages. This translation has been
done in a simplified way, starting from the English collection of keywords and then trying to
find a direct translation for each term. Whenever there is no translation for a keyword,
because it does not exist in the target language, the English word remains. The translation of
concepts into another language is, however, more complex in reality. A keyword in one
language can sometimes only be described by more than one keyword in another language.
On the other hand, in one language many different concepts exist, all having the same basic
meaning than one concept in another language. Consider an example taken from the Chinese
translation of the AGROVOC. The English word `abortion' expresses the sense of concept no
matter in which context it is used. In Chinese, there is no perfect equivalent. In fact, there are
three different concepts to express the concept of abortion in the human, the plant and the
animal domain respectively. The simplified structure of the AGROVOC thesaurus cannot
capture this information. In the case of AGROVOC, only one Chinese term (the human sense)
has been chosen to represent the concept of `abortion' in Chinese. By using this term to index

13
documents, a subset of Chinese documents could not be indexed or would be indexed
wrongly. Hence, a Chinese searching for information on the concept of plant abortion would
retrieve as well information on human and animal abortion. Currently, the AGROVOC
thesaurus does not provide a solution for this problem. Another good example of this
multilingual translation and concept-mapping problem is explained in [Rol01], where the non-
compatible translation of the English term river into either rivière or fleuve in French is
discussed. Ontologies, as introduced in the next chapter, provide the modelling capabilities to
address such issues.
2.3
Problems with the current system and proposal
Currently, there is no single access point for users to effectively search for information on
FAO's web sites. They are forced to browse many pages and perform many searches through
trial and error. Two different groups feed two different systems with metadata in an
inconsistent way. The same documents might be indexed twice in the different system in an
inconsistent manner. In the Document Repository, the indexing is sometimes not done
according to the rules and non-descriptors might be used for indexing. FAODOC metadata is
basically more reliable and consistent, due to fewer and better-trained indexers working on it.
Only 5-10% of FAO's web sites are stored in the EIMS system. Therefore, information
retrieval about specific department web sites is rather poor. Lack of human resources and the
fast growth of information resources create a huge backlog in metadata creation. This fact,
along with regular processing inefficiencies within large organisations, makes it impossible to
gather all the information in one centralised system. The decentralised structure will therefore
remain and domain specific information will be available through the various domain specific
systems.
The integration of these domains is the vision of the AOS. Crucial to integration of this
information is, however, the creation of metadata of all these resources within the departments
in a consistent and controlled way. Subject indexing especially will be responsible for
retrieval of the respective resources. Automatic support for this time consuming task would be
invaluable. Using controlled vocabularies to subject index domain resources sets the basis for
harvesting information across several domains. The AGROVOC is not specific enough for all
areas in order to be used for subject indexing in specific domains. Some agriculture domains
are either not or not sufficiently captured in the AGROVOC (for example fishery, forestry or
food safety). Domain specific controlled vocabularies therefore need to be established.

14
This is where the framework for the creation of domain specific ontologies and automatic
text classification fits into the context of the Food and Agriculture Organisation. The main
work of this thesis will therefore focus on these two fields. In the next chapter, I will give an
introduction to the Semantic Web and define ontologies in their here used context. The
underlying terminology for understanding the following chapters and the broader
technological context in which this project is embedded will be introduced here.

15
3
Semantic Web
3.1
The idea
The idea of the Semantic Web introduced by Tim Berners-Lee the first time in 1996
[Ber96] has been described by himself as follows:
"The Semantic Web is an extension of the current web in which information is
given well-defined meaning, better enabling computers and people to work in co-
operation" ([LHL01]).
The Semantic Web is basically the idea of linking information objects on the web in such a
way to make them easily processable for machines. The problem with the majority of data on
the Web at the moment is that it is difficult to use on a large scale, because there is no global
system for publishing data in such a way as it can be easily processed by anyone. XML as
specified in [BPSM00] has a widespread use in representing data in an interchangeable and
reusable format. The status today is, however, that all over the web, loads of data basically
talking about the same or similar issues is made available and described in XML or pure
HTML. These languages lack the semantics needed in order to resolve similarity issues. Same
data is modelled an indefinite number of times in different locations using different
representations. The Semantic Web is an effort to unambiguously define and identify
resources on the World Wide Web and to interconnect them with semantic relationships in
order to provide the described resources in a machine-readable, understandable and reusable
form to anyone who wants to make use of them. The Semantic Web moves from the idea to
relate pieces of meaningless text, as it is basically done now with HTML and hyperlinks,
towards affiliating objects with semantic relationships.
In Semantic Web terminology, every object in the world is a resource
11
and can be linked
to any other resource. A resource can be uniquely identified by its Uniform Resource
Identifier (URI)
12
as specified in [BFIM98]. A URI is defined as a compact string of
characters for identifying an abstract or physical resource. The ability to uniquely reference
and identify a resource sets the basis for the Semantic Web.
11
Similar to resource as defined in the context of the FAO in the previous chapter, where every information
object is called a resource.
12
See also [http://www.w3.org/Addressing/].

16
The Resource Description Framework (RDF) is a foundation for exposing and
processing metadata as recommended in [LS99]. It has been designed to provide
interoperability between applications that exchange machine-understandable information on
the WWW by offering the possibility to express statements about resources in a machine
processable format. An RDF statement is a triple, always consisting of a subject, predicate
and object, making it similar in format to a natural language expression. The difference here is
that each part is a URI. Let us consider the following RDF statement:
<http://www.borislauser.de> <http://www.relationships.com/schema/isStudentAt> <http://www.uni-karlsruhe.de>
The subject, Boris Lauser, is a person (i.e. the resource to be described). The predicate
describes that the subject is a student at some other resource (i.e. a property of the resource
person). The object is the University of Karlsruhe (i.e. the value of the property; in that case,
another resource). This statement can be read and processed by machines. By using URIs,
everyone can use RDF to make statements about anything. RDF makes it possible to create
interchangeable metadata and publish it on the web to be reused by others. So if other parties
make other statements about the subject Boris Lauser, an application collecting all these
statements, could relate and combine the information given in them and infer other
statements.
XML has evolved as the standard format for information interchange. Therefore XML is
now widely used to encode RDF statements and is suggested as the standard syntax by the
W3C. RDF and XML are therefore complementary in that RDF describes a model, which can
be represented using different syntaxes. XML is one syntax for doing so. Another example is
Notation 3 or N3
13
. Figure 4 shows a possible XML encoding of above statement.
Figure 4: XML serialisation of RDF, example
13
Refer to
http://www.w3.org/2000/10/swap/Primer.html
for a good overview on N3.

17
The Semantic Web is herewith a means of creating metadata about arbitrary resources on
the web, just like metadata about information resources in the FAO as described in the
previous chapter. These information resources can be located by a URI, and hence, the
Semantic Web idea is highly applicable in this context.
The problem so far, however, is that all these objects represented by URIs can now be
talked about and processed by machines to infer statements and expressions about them, but
they are nowhere defined yet. As a human being, we know, that `Boris Lauser' is a person,
but a machine does not. And different machines and statements referring to this very object
might therefore interpret it differently and in an unintended way. An ontology can solve this
problem by providing the opportunities to define and specify the meaning of and relationship
between terms.
3.2
Ontologies
3.2.1 Introduction
The term ontology originally evolved from a branch of philosophy that deals with the
nature and the organisation of reality [Gua98]. In terms of information management and in the
context of the Semantic Web, many definitions of the term have been named. In [Gru95], an
ontology is defined as "an explicit specification of a conceptualisation". When talking about
conceptualisation in this context it is meant to identify concepts and other entities describing a
domain of interest and the relationships that hold amongst them. It refers to an abstract model
of how people think about physical or abstract objects in the world, usually restricted to a
particular subject area. An explicit specification means the concepts and relationships of the
abstract model are given explicit terms and definitions. In the context of the AOS, ontologies
are referred to as a collection of terms, the definition of these terms, and the specification of
relationships amongst them as stated in [AOS01]. The definition can get as loose as "a
vocabulary of terms and some specification of their meaning" in [UG96].
It is not my intention to give a universal definition for ontologies here, but rather focus on
how they are defined and used within the context of this research and project environment.
The definition, which probably suits best the approach taken here is given in [SBF98]: An
ontology is an explicit, formal specification of a shared conceptualisation of a domain of
interest. It is shared because in a certain domain (more about domains in the next section),
everybody agrees and has the same view on this explicit specification. To provide the
necessary formalisation, a more mathematical definition of a conceptual modelling approach

18
of ontologies is specified in [MMV02]. The following definitions are taken from this
specification and introduce the base terminology of the ontology definition used and built
upon throughout the further work of this thesis:
Defnition 1 (OI-model Structure). An OI-model (ontology-instance-model) structure is a
tupelo OIM := (E; INC) where:
E is the set of entities of the IO-models,
INC is the set of included OI-models.
An OI-model represents a self-contained unit of structured information that may be reused.
Elements in an OI-model are entities. An OI-model may include a set of other OI-models
(represented through the set INC). Definition 5 lists the conditions that must be fulfilled when
an OI-model includes another model.
Definition 2 (Ontology Structure). An ontology structure associated with an OI-model is a
10-tuple O(OIM) := (C; P; S; T; INV;HC;HP ; domain; range; mincard; maxcard) where:
C E is a set of concepts,
P E is a set of properties,
S P is a subset of symmetric properties,
T P is a subset of transitive properties,
INV P P is a symmetric relation that relates inverse properties, if (p1; p2)
INV, then p1 is an inverse property of p2,
H
C
C C is an acyclic relation called concept hierarchy, if (c1; c2) H
C
then c1 is a sub-concept of c2, c2 is a super-concept of c1,
H
P
P P is an acyclic relation called property hierarchy, if (p1; p2) H
P
then p1 is a sub-property of p2, p2 is a super-property of p1,
Function domain:
}
{
})
{
\
2
(
L
P
C
gives the set of domain concepts
for some property p P,
Function range:
}
{
})
{
\
2
(
L
P
C
gives the set of range concepts for
some property p P,
Function mincard: C P
0
N gives the minimum cardinality for each
concept-property pair,

19
Function maxcard: C P
})
{
(
0
N
gives the maximum cardinality for
each concept-property pair.
Each OI-model has an ontology structure associated with it, consisting of a set definitions
regulating how instances should be constructed. An ontology consists of concepts (sets of
elements) and properties (specification how objects may be connected). Each property must
have at least one domain concept, while its range may either be a literal, or a set of at least
one concept. Domain and range concept restrictions are treated conjunctively - all of them
must be fulfilled for each property instantiation. Some properties may be marked as transitive,
and it is possible to say that two properties are inverse. For each class-property pair, it is
possible to specify the minimum and maximum cardinalities, defining how many times a
property may be specified for instances of that class. Concepts and properties can be arranged
in a hierarchy, as specified by the H
C
(H
P
) relation. This relation relates directly connected
concepts (properties), whereas its transitive closure follows from the semantics, as defined in
the next subsection.
Definition 3 (Instance Pool Structure). An instance pool associated with an OI-model is a 4-
tuple IP(OIM) := (I; L; instconc; instprop) where:
I
E is a set of instances,
L is a set of literal values, L E = ,
Function instconc : C
I
2 relates a concept with a set of its instances,
Partial function instprop :
L
I
I
P
2
assigns to each property-instance
pair a set of instances related through given property.
Each IO-model has an instance pool associated with it. An instance pool is constructed by
specifying instances of different concepts and by establishing property instantiation between
instances. Property instantiations must follow the domain and range constraints, and must
obey the cardinality constraints.
Definition 4 (Root OI-model Structure).
Root OI-model is defined as a particular, well-
known OI-model with structure
)
},
({
:
ROOT
ROIM
. ROOT is the root concept, each
other concept must subclass ROOT (it may do so indirectly). Each other OI-model must
include ROIM and thus gain visibility to the root concept. This is similar to object-oriented
languages approaches - for example, in Java every class extends java.lang.Object class.

20
Definition 5 (Modularization Constraints).
If OI-model OIM imports some other OIModel
1
OIM
(with elements are marked with subscript 1), that is, if
)
(
1
OIM
INC
OIM
must satisfy
following modularization constraints:
,
,
,
,
,
,
,
1
1
1
1
1
1
1
P
P
C
C
H
H
H
H
INV
INV
T
T
P
P
C
C
R
R
),
(
)
(
1
1
p
domain
p
domain
P
p
),
(
)
(
1
1
p
range
p
range
P
p
),
,
(
min
)
,
(
min
,
1
1
1
p
c
card
p
c
card
C
c
P
p
),
,
(
max
)
,
(
max
,
1
1
1
p
c
card
p
c
card
C
c
P
p
,
,
1
1
L
L
I
I
),
(
)
(
1
1
c
domain
c
instconc
C
c
).
,
(
)
,
(
,
1
1
1
i
p
instprop
i
p
instprop
I
i
P
p
If an OI-model imports some other OI-model, it contains all information - no information may
be lost. Modularization constraints just specify structural consequences of importing an OI-
model. This is independent from the implementation - imported OI-models may be physically
duplicated, whereas in other cases they may be linked.
3.2.2 Types of ontologies
The notion of OI-model introduced above already incorporates the well-known software
engineering paradigms of modularity and reusability provided by the INC set. These
paradigms are extremely important in the discipline of ontology engineering. This becomes
evident when thinking about possible different levels of how to describe things in an
ontology. On one hand, an ontology might be sufficient describing the structure of an
organisation on the top level, only representing the main organisational units. On the other
hand, it might be necessary for an application (like a corporate knowledge base) to capture the
whole organisation with all its employees in an ontology. The first high-level ontology can
certainly be used as part of the second, so that common concepts don't have to be remodelled
there. However, both ontologies are on different levels. Figure 5 shows an overview about the
different types of ontologies as identified in [Gua98]. Guarino differentiates between four
types:
Top-level ontologies
describe very general concepts like space, time, matter, object, event,
action, etc., which are independent of a particular problem or domain: it seems therefore

21
reasonable, at least in theory, to have unified top-level ontologies for large communities of
users.
Domain ontologies
and task ontologies describe, respectively, the vocabulary related to a
generic domain (like medicine, or automobiles) or a generic task or activity (like diagnosing
or selling), by specialising the terms introduced in the top-level ontology.
Application ontologies
describe concepts depending both on a particular domain and task,
which are often specializations of both the related ontologies. These concepts often
correspond to roles played by domain entities while performing a certain activity, like
replaceable unit or spare component.
Figure 5: Ontology types
In [MIK96], this typology is even more refined taking into consideration the usage of the
ontology, introducing qualifiers like task- or application-dependent/-independent. For this
purpose, and taken into consideration the stage of ontology development and usage, and the
insecurity inherent in it, the above given differentiation is sufficient. Given the project scope
of this thesis, which aims on the integration of different domains using ontologies as
discussed in the previous chapter, the further work will focus on domain ontologies
respectively. Whenever using the term ontology in the following, I always refer to this
particular type.
Given the definition and scope of ontologies, I will now introduce to several ways of
ontology representation and the approach taken here in this context.

22
3.2.3 Ontology representation languages
The term `formal' in the above ontology definitions refers to the fact that the established
conceptualisation has to be formalised in a way that is unambiguous and in the context of the
Semantic Web machine-readable. Several formal representation languages exist and are used
today. Among the most common ones used in the past and today are:
RDF/RDFS [BG02],
DAML + OIL [CHH+01],
Topic Maps [PM01],
Ontolingua [FFR97],
FLogic [KLW90] and
LOOM [Bri93].
Traditional ontology representation languages like Ontolingua, FLogic or LOOM evolved
from different underlying paradigms: frame-based, description logic, first and second order
predicate calculus and object-oriented. Recently, new languages for the web have been
created like XML, RDF and RDF Schema (RDFS). Ontology representation languages like
DAML/OIL or the latest development effort OWL, are created as extensions of these. Figure
6 shows an overview of the degree of expressiveness of different representation languages
taken from [CG00]. Obviously, the languages grow in expressiveness, but also in complexity
from bottom to the top. Given the rather web based context of the thesis, I will briefly
introduce the web based representation languages, especially RDFS, as well as the proprietary
extension of it which is used within this project for modelling and representing ontologies. A
comprehensive, in-depth evaluation of different languages based on their expressiveness and
reasoning capabilities is given in [RC00].
Figure 6: Ontology representation languages and their expressiveness taken from [CG00]

Details

Seiten
Erscheinungsform
Originalausgabe
Jahr
2003
ISBN (eBook)
9783832469054
ISBN (Paperback)
9783838669052
Dateigröße
1.9 MB
Sprache
Englisch
Institution / Hochschule
Karlsruher Institut für Technologie (KIT) – Wirtschaftsingenieurwesen, Angewandte Informatik
Erscheinungsdatum
2014 (April)
Note
1,3
Schlagworte
klassifikation pruning multi-label-klassifikation multilingual thesaurus
Zurück

Titel: Semi-automatic ontology engineering and ontology supported document indexing in a multilingual environment
book preview page numper 1
book preview page numper 2
book preview page numper 3
book preview page numper 4
book preview page numper 5
book preview page numper 6
book preview page numper 7
book preview page numper 8
book preview page numper 9
book preview page numper 10
book preview page numper 11
book preview page numper 12
book preview page numper 13
book preview page numper 14
book preview page numper 15
book preview page numper 16
book preview page numper 17
book preview page numper 18
book preview page numper 19
book preview page numper 20
book preview page numper 21
book preview page numper 22
book preview page numper 23
book preview page numper 24
book preview page numper 25
book preview page numper 26
book preview page numper 27
130 Seiten
Cookie-Einstellungen