ONTOSEARCH: AN ONTOLOGY SEARCH ENGINE
Abstract:
Reuse of knowledge bases and the
semantic Web are two promising areas in knowledge technologies. Given some user
requirements, finding the suitable ontologies is an important task in both
these areas. This paper discusses our work on Onto Search, a kind of “ontology
Google", which can help users find ontologies on the Internet. Onto Search
combines Google Web APIs with a hierarchy visualization technique. It allows
the user to perform keyword searches on certain types of “ontology” files, and
to visually inspect the files to check their relevance. Onto Search system is
based on Java, JSP, Jena and JBoss technologies.
1.
Introduction
Reuse of knowledge bases is an important
area in knowledge technologies. Determining the principal topic of an existing
knowledge base (KB) is very important for the reuse of knowledge bases. Identify-Knowledge-Base
is a tool to identify the principal topic(s) of some particular knowledge base
by matching concepts (extracted from the KB) against a reference taxonomy (extracted
from reference ontology). Finding (normally from the Internet) relevant
reference ontology for a particular KB is the key point in the use of the IKB
system. The Semantic Web provides a common framework that allows data to be
shared and reused across application, enterprise, and community boundaries. It
envisions the globally interconnected network of machine-process able
information, made possible by means of the sharing of semantic data models or
ontologies. Locating suitable existing ontologies to capture the user-required
information from the Internet is a big challenge in the current research of the
Semantic Web.
Finding a suitable ontology from the
Internet is a hard task. There is still no good tool to handle this problem.
Google offers a powerful web search engine. However, with regard to the
ontology searching, it has its own problems, such as a lack of visualization
facilities. Google APIs give us a chance
to develop our own tool (Onto Search) to search the relevant ontology files to
meet the user requirements. In this article, we discuss the issue of searching
for relevant ontologies on the Internet and introduce our tool, Onto Search. In
section 2, we give some background to our research and list some current
problems. In section 3, Onto Search is introduced in detail. In section 4, some
discussion and future work are given followed by a brief summary.
2.
Background
2.1
IKB: Identify Knowledge Base
Reuse of knowledge bases is a promising
area in knowledge technologies and many researchers are focusing on how to
reuse existing knowledge bases for different applications. Such requests for
reuse are often specified as a knowledge base (KB) characterization problem: Require knowledge
base on topic T, conforming to the set of constraints C .There
are two key points here: 1) Decide what the principal topic (T) of a given
knowledge base is. 2) Decide whether a KB conforms to certain constraints C. As
we noted, determining the principal topic of an existing knowledge base (KB) is
an important step in the reuse of knowledge bases. Identify-knowledge -Base (IKB)
is a tool to suggest the principal topic(s) addressed by a knowledge base.
It matches concepts extracted from a particular knowledge base against some reference
taxonomy, where the taxonomy can be pre -stored or extracted from ontologies
which are either stored on the local machine or are accessible through the WWW.
The 'most specific' super-concept subsuming these extracted concepts is said to
be the principal topic of the knowledge base.
Figire
1:Taxonomy showing different kinds of food
Here we give a simple example about taxonomy
of Food. If the concepts {Apples, Pears} are extracted and passed to the IKB
system, the system would suggest that {Fruit} might be the focus of the
knowledge base. Similarly, if the concepts {Apples, Potatoes, and Carrots} are
extracted, {Fruit vegetables} would be the output. If the set of concepts
{Potatoes, Chicken and Game} is provided, topic {Food} would be returned as the
result. The IKB system is implemented in Java, Jena, a Java API, is used to
manipulate RDF models. The ExtrAKT system developed at Edinburgh University is used
to extract concepts from a Prolog knowledge base and then passes them to the IKB
system.
There are two main inputs in the IKB
system: extracted concepts from KB and reference taxonomy. The concepts can be extracted
by the ExtrAKT system. However, choosing suitable reference ontology is very
hard. In using the IKB system, we found that there are a huge number of
ontologies available online; but finding relevant reference ontology for some
particular KB is not an easy job at all. However, finding relevant reference
ontology taxonomy is essential for using the IKB system.
2.1.1 Knowledge Flow
The first step to
design a knowledge management system is to identify the knowledge flows of the
system. Knowledge flows comprise a set of processes, events and activities
through which data, information, knowledge and metaknowledge
are
captured, transferred, and transformed from one state to another. Our framework
has four types of knowledge artifacts due to different level of conceptualization..
In order to capture and represent knowledge, we develop domain ontologies after
consulting the domain experts. Annotators will embed in semantic concepts
modeled by ontology into their web pages. In an essence, the web resources
(typically web documents) are marked in a web ontology language. Basically, all
assertions and instances as specified in ontologies and those instances marked
in web documents will go into knowledge base (KB). The knowledge base is an
integral part of in reasoning (inference) system. The reasoning system applies
rules to derive new knowledge. To get the knowledge artifacts in from the
unstructured knowledge, we use knowledge processes, they are annotation with
ontology development, crawling, inference. Annotation and development process
derive the unstructured knowledge to structured knowledge with semantic
meanings. Crawling process gathers the dispersed knowledge in semantic web
pages. Inference process makes the knowledge machine processible and
intelligent. As a result, the above three processes are also based on ontology
and are the fundamental building blocks of our web knowledge management
framework.
Figure: Knowledge flow
2.2
Semantic Web
"The Semantic Web is an extension of the current web
in which information is given a well-defined meaning, better enabling computers
and people to work in cooperation."
The Semantic Web provides a common
framework that allows data to be shared and reused across application,
enterprise, and community boundaries. It envisions a globally interconnected
network of machine-process able information, made possible by the sharing of
semantic data models, which is also known as ontologisms. The Semantic Web is a
collaborative effort led by the World Wide Web consortium with participation
from a large number of researchers and industrial partners. It is based on the
Resource Description Framework (RDF), which integrates a variety of
applications using XML9 for syntax and URIs for naming. There are many people
working in this area to improve, extend and standardize the Semantic Web. Many
documents and tools have already been developed. However, Semantic Web
technologies are still in the infancy and there are many challenges in this
area. One of the most important issues is to locate suitable existing ontologies
to capture the user-required information from the Semantic Web. For example, if
you want to publish your top ten favorite music tracks in Semantic Web, you
would like to find some ontologies that represent real-world things like "artist",
"track title", and "album". Otherwise, you will have to
build these ontologies yourself. However, to locate suitable ontology from the
Semantic Web is currently far from easy and there is still no handy tool to
help the users as we know. So, we need to build a kind of "ontology
Google" tool to kick-start this process.
Semantic Web Layered
Architecture
The common use of the term
Semantic Web is to identify a set of technologies, tools and standards which
form the basic building blocks of a system that could support the vision of a
Web imbued with meaning. The Semantic Web has been developing a layered architecture,
which is often represented using a diagram first proposed by Tim Berners-Lee,
with many variations since. Figure gives a typical representation of this
diagram.
Figure : Semantic Web layered
architecture
2.3
Google Application for ontologies
Nowadays, Google is widely used to
search for information on the Internet. With the powerful facilities offered by
Google, we can rapidly search many resources on the web. The next question is:
Can one use Google to locate an existing ontology, which conforms to the user’s
requirements? The answer is “Yes”. We can simply use the Google facility “file
type:” to limit the type of searching file. For example, if we search in Google
for “file type: RDF Food”, then Google will return all the RDFs files with the
keywords “Food”. So the user can use Google to search for existing ontologies
in different formalism, such as DAML (+OIL) 11, RDFs12, OWL13, etc. and use (or
reuse) them for their own needs. It seems Google is a good way to help the user
find suitable online Ontology resources. However, after some experiments
(basically focused on finding RDFs files), we found it does not perform as
expected; it is very hard to use Google to search for suitable ontology files.
There are several problems: Firstly, ontologies are not always available for a
particular topic/domain. Some domains have many resources while others have
very few.
Secondly, Google returns links of
relevant files, and the user will have to check if they are really relevant.
This can be very time consuming because Google does not offer a quick way to
browse ontology files. Last but not least, Google searches files based on
keywords supplied by the users. It does not check the real content and
structure of the files. Some (usually many) irrelevant files will be returned
to the user, just because they have the keywords somewhere in their files. We
quite often find many RDFs files, which contain the required keywords, but on
further examination of the ontology, we realized that the files do not match
our needs at all; that is, they do have the required keywords, but they are not
situated as required. For example, when we searched for a food ontology using
the keyword concept “Food”, ontologies about the Animal domain are also
returned, because the file contains a statement, such as “animal food vegetarian”.
Obviously, it is not really what we want. This kind of “mistakes” can cost the
user more time to find acceptable ontologies. Thus, Google’s keyword searching
is not good enough as an ontology search tool. Google Web APIs are a free beta
service to help programmers develop their own Google-based applications. With
the Google Web APIs service, software Developers can query more than 4 billion
web pages directly from their own computer programs. Google uses SOAP14 and
WSDL15 standards so a developer can program in his or her favorite environment,
such as Java, Perl16, or Visual Studio .NET.17 So, with the support of Google
Web APIs, we can develop a more specific tool to search for user-required
ontologies from the Semantic Web.
3.
Empirical Studies
3.1
Design of Onto Search
As mentioned in the last section,
finding ontologies to satisfy user requirements is a very important issue, in
both KB reuse and Semantic Web areas. There is no existing tool to solve this
problem. Google does have the power, but does not seem to be specific enough to
give good results.
After some experiments, we noticed that
the problem arises because Google does not offer a good visualization function
for the ontology files (in different formalisms, such as RDFs, etc.). As the
user cannot view the ontology in an intuitive graphic format, they have to look
through the ontologies as structured text files. This process takes a lot of
time and cannot guarantee a good result, as the plain text of the ontology
cannot show the internal structure of the ontology clearly.
After reviewing some Ontology tools, we
find that showing the hierarchy (structure) of ontology is very important to
help the user to understand the nature of the ontology. Most of the tools offer
a facility of hierarchy viewing to support the user to build and edit
ontologies. A hierarchical view of ontology seems to be a good way to give the
user a quick overview of the selected ontology. In this piece of work, we
investigate the applicability of visualization techniques for ontology
searching on the Internet. To answer this question, we developed a
visualization tool, Onto Search, which combines the Google search engine
together with the RDFs ontology (hierarchy) visualization technology. It helps
the user search for relevant (based on keywords) ontology files on the Internet
and displays the files in a visually appealing way—a Hierarchy tree. The
hierarchical view allows users to quickly review the structures of different
ontology files and select the relevant ontology files. We show a diagrammatic
overview of Onto Search in Figure 2.
The user inputs to Onto Search the
keywords to describe the nature of the required ontology. Then Onto Search
applies the Google engine to search for RDFs files related to the keywords and
returns a list of relevant links (URLs) to the user. The user then chooses some
of the returned RDFs files and displays their structure, and decides which of
the files are relevant. Finally, the users select the relevant RDFs files and
save them in a taxonomy library for future use. As we now have the ontology searching
tool Onto Search, we can link it to our other tool IKB.
3.2
Development of OntoSearch
The Onto Search system is implemented in
Java and JSP19. It is a web-based system, which can offer online service based
on JBoss20. Jena, a Java API for manipulating RDF models, is used to read the
ontology (RDFs file) into Java. Google Web APIs contribute to the Internet
search engine. One JSP tag is applied to visualize the hierarchy structure of
the ontology.
The user can browse and use the Onto Search
interface using any web browser. The user inputs keywords to describe the
nature of the required ontology on the keyboard. Then, Onto Search will apply
the Google Web APIs to search the Internet for relevant files (the file type is
restricted as RDFs now but can be changed) and return all the URLs on the
screen. The user can select the files to inspect their structures in a
hierarchy tree view. Thus, the user can get a general idea of the content and
structure of the returned ontologies. Finally, the user can save the relevant
ontology on local disk.
3.3
Demonstration of Onto Search
Next, an example of using Onto Search is
given. Suppose the user is looking for some ontology in a Food domain. The
required ontology should contain some real world knowledge about food and
related issues. The user inputs the keyword “Food” into Onto Search. After
searching, some RDFs files are returned as results, which are shown in Figure 4.
Figure 4: Search
ontologies by keywords
As often many RDFs files are returned,
the user then has to inspect them to check if these files are really about the
Food domain. As there is one file named “Food.RDFs”, the user selects that one
first. The content of that RDFs file is shown as triples in Figure 5. As often
many RDFs files are returned, the user then has to inspect them to check if
these files are really about the Food domain. As there is one file named “Food.RDFs”,
the user selects that one first. The content of that RDFs file is shown as
triples in Figure. There is only one kind of triple in this ontology. All the triples
are “subClassOf” type of triple. All the concepts in this ontology are subclasses
(within several levels) of the food concept. Thus, we can think this ontology
is a hierarchy of different kinds of foods. In Fact, this ontology file does match
the user’s needs. Figure 6 gives the hierarchy of that ontology. Obviously,
this format is much easier for the user to understand than the triple format
which is shown in Figure 5.
Figure
6: Hierarchy visualization of selected ontology
After viewing the hierarchy of the
select ontology, the user makes the decision whether the ontology is relevant
to the requirement, and then proceeds to check further returned ontologies.
Prototype
This
model of annotation-crawling/inference guides us in developing various KM
applications. In this on-going research, we are developing a prototype for
demonstration purpose. We have defined the application as a simple skill management
system for human resource department in an
IT
company. The application and the ontology developer may implement other kinds
of application and other domain ontology according to specific needs and
business domains. There are many researches on the ontology development. We
have developed the skill ontologies, which conceptualize and describe an IT
company human resource skill structures. A snapshot of the ontologies is shown
in Figure. A piece of codes following shows Skill ontology representation using
DAML+OIL is as follows where the name space is “xmlns:daml_oil.
Figure:
Skill Ontology
4.
Summary, Discussion and Future Work
As mentioned earlier, the Onto Search
system is a useful tool which can search for ontology files from the Internet
and visualize them as hierarchies. The next stage of our work will be
developing an advanced mode for Onto Search system: The current Onto Search
system is quite simple. It can only search for one type (RDFs) of ontology
file, and it only compares the user keywords with the contents of the ontology
files wherever they occur. And so it matches indiscriminately the keywords both
from concepts and comment fields. A future version of Onto Search Figure 6:
Hierarchy visualization of selected ontology will allow the user to choose
different representational formalisms used to express ontologies, and it will
allow the user to specify the type of entity (concepts, attribute or comments,
etc.) to be matched. Other future work includes:
Creating a “library” of the Taxonomies
More experiments will be carried out,
especially on particular domains to test our Onto Search system. The
user-acceptable ontologies will be stored in a repository for future use (e.g.
for use with IKB).
WordNet22
application
The synonym problem is not well
addressed in the current version of Onto Search. We are planning to incorporate
WorldNet in future versions so that our tool will be more effective, i.e. it
will retrieve a large number of relevant ontologies.
References
1.
Advanced Knowledge Technology (AKT project) http://www.aktors.org/akt/
2.
Sleeman D, Potter S, Robertson D, and Schorlemmer W.M. Enabling Services for
Distributed Environments: Ontology
Extraction and Knowledge Base Characterisation, ECAI-2002 workshop, 2002
3.
Sleeman D, Zhang Yi, Vasconcelos W. Characterisation of Knowledge Bases, Proceedings
of AI-2003 (the twenty-third Annual International Conference of the British
Computer Society's Specialist Group on Artificial Intelligence (SGAI)), 2003
4.
Schorlemmer M, Potter S, and Robertson D. Automated Support for Composition of
Transformational Components in Knowledge Engineering. Informatics Research
Report EDI-INF-RR-0137, June, 2002.
Comments
Post a Comment