ONTOSEARCH: AN ONTOLOGY SEARCH ENGINE

ONTOSEARCH: AN ONTOLOGY SEARCH ENGINE

Abstract:

Reuse of knowledge bases and the semantic Web are two promising areas in knowledge technologies. Given some user requirements, finding the suitable ontologies is an important task in both these areas. This paper discusses our work on Onto Search, a kind of “ontology Google", which can help users find ontologies on the Internet. Onto Search combines Google Web APIs with a hierarchy visualization technique. It allows the user to perform keyword searches on certain types of “ontology” files, and to visually inspect the files to check their relevance. Onto Search system is based on Java, JSP, Jena and JBoss technologies.

1. Introduction

Reuse of knowledge bases is an important area in knowledge technologies. Determining the principal topic of an existing knowledge base (KB) is very important for the reuse of knowledge bases. Identify-Knowledge-Base is a tool to identify the principal topic(s) of some particular knowledge base by matching concepts (extracted from the KB) against a reference taxonomy (extracted from reference ontology). Finding (normally from the Internet) relevant reference ontology for a particular KB is the key point in the use of the IKB system. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It envisions the globally interconnected network of machine-process able information, made possible by means of the sharing of semantic data models or ontologies. Locating suitable existing ontologies to capture the user-required information from the Internet is a big challenge in the current research of the Semantic Web.

Finding a suitable ontology from the Internet is a hard task. There is still no good tool to handle this problem. Google offers a powerful web search engine. However, with regard to the ontology searching, it has its own problems, such as a lack of visualization facilities. Google APIs give us a chance to develop our own tool (Onto Search) to search the relevant ontology files to meet the user requirements. In this article, we discuss the issue of searching for relevant ontologies on the Internet and introduce our tool, Onto Search. In section 2, we give some background to our research and list some current problems. In section 3, Onto Search is introduced in detail. In section 4, some discussion and future work are given followed by a brief summary.

2. Background

2.1 IKB: Identify Knowledge Base

Reuse of knowledge bases is a promising area in knowledge technologies and many researchers are focusing on how to reuse existing knowledge bases for different applications. Such requests for reuse are often specified as a knowledge base (KB) characterization problem: Require knowledge base on topic T, conforming to the set of constraints C .There are two key points here: 1) Decide what the principal topic (T) of a given knowledge base is. 2) Decide whether a KB conforms to certain constraints C. As we noted, determining the principal topic of an existing knowledge base (KB) is an important step in the reuse of knowledge bases. Identify-knowledge -Base (IKB) is a tool to suggest the principal topic(s) addressed by a knowledge base. It matches concepts extracted from a particular knowledge base against some reference taxonomy, where the taxonomy can be pre -stored or extracted from ontologies which are either stored on the local machine or are accessible through the WWW. The 'most specific' super-concept subsuming these extracted concepts is said to be the principal topic of the knowledge base.

Figire 1:Taxonomy showing different kinds of food

Here we give a simple example about taxonomy of Food. If the concepts {Apples, Pears} are extracted and passed to the IKB system, the system would suggest that {Fruit} might be the focus of the knowledge base. Similarly, if the concepts {Apples, Potatoes, and Carrots} are extracted, {Fruit vegetables} would be the output. If the set of concepts {Potatoes, Chicken and Game} is provided, topic {Food} would be returned as the result. The IKB system is implemented in Java, Jena, a Java API, is used to manipulate RDF models. The ExtrAKT system developed at Edinburgh University is used to extract concepts from a Prolog knowledge base and then passes them to the IKB system.

There are two main inputs in the IKB system: extracted concepts from KB and reference taxonomy. The concepts can be extracted by the ExtrAKT system. However, choosing suitable reference ontology is very hard. In using the IKB system, we found that there are a huge number of ontologies available online; but finding relevant reference ontology for some particular KB is not an easy job at all. However, finding relevant reference ontology taxonomy is essential for using the IKB system.

2.1.1 Knowledge Flow

The first step to design a knowledge management system is to identify the knowledge flows of the system. Knowledge flows comprise a set of processes, events and activities through which data, information, knowledge and metaknowledge

are captured, transferred, and transformed from one state to another. Our framework has four types of knowledge artifacts due to different level of conceptualization.. In order to capture and represent knowledge, we develop domain ontologies after consulting the domain experts. Annotators will embed in semantic concepts modeled by ontology into their web pages. In an essence, the web resources (typically web documents) are marked in a web ontology language. Basically, all assertions and instances as specified in ontologies and those instances marked in web documents will go into knowledge base (KB). The knowledge base is an integral part of in reasoning (inference) system. The reasoning system applies rules to derive new knowledge. To get the knowledge artifacts in from the unstructured knowledge, we use knowledge processes, they are annotation with ontology development, crawling, inference. Annotation and development process derive the unstructured knowledge to structured knowledge with semantic meanings. Crawling process gathers the dispersed knowledge in semantic web pages. Inference process makes the knowledge machine processible and intelligent. As a result, the above three processes are also based on ontology and are the fundamental building blocks of our web knowledge management framework.

Figure: Knowledge flow

2.2 Semantic Web

"The Semantic Web is an extension of the current web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation."

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It envisions a globally interconnected network of machine-process able information, made possible by the sharing of semantic data models, which is also known as ontologisms. The Semantic Web is a collaborative effort led by the World Wide Web consortium with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML9 for syntax and URIs for naming. There are many people working in this area to improve, extend and standardize the Semantic Web. Many documents and tools have already been developed. However, Semantic Web technologies are still in the infancy and there are many challenges in this area. One of the most important issues is to locate suitable existing ontologies to capture the user-required information from the Semantic Web. For example, if you want to publish your top ten favorite music tracks in Semantic Web, you would like to find some ontologies that represent real-world things like "artist", "track title", and "album". Otherwise, you will have to build these ontologies yourself. However, to locate suitable ontology from the Semantic Web is currently far from easy and there is still no handy tool to help the users as we know. So, we need to build a kind of "ontology Google" tool to kick-start this process.

Semantic Web Layered Architecture

The common use of the term Semantic Web is to identify a set of technologies, tools and standards which form the basic building blocks of a system that could support the vision of a Web imbued with meaning. The Semantic Web has been developing a layered architecture, which is often represented using a diagram first proposed by Tim Berners-Lee, with many variations since. Figure gives a typical representation of this diagram.

Figure : Semantic Web layered architecture

2.3 Google Application for ontologies

Nowadays, Google is widely used to search for information on the Internet. With the powerful facilities offered by Google, we can rapidly search many resources on the web. The next question is: Can one use Google to locate an existing ontology, which conforms to the user’s requirements? The answer is “Yes”. We can simply use the Google facility “file type:” to limit the type of searching file. For example, if we search in Google for “file type: RDF Food”, then Google will return all the RDFs files with the keywords “Food”. So the user can use Google to search for existing ontologies in different formalism, such as DAML (+OIL) 11, RDFs12, OWL13, etc. and use (or reuse) them for their own needs. It seems Google is a good way to help the user find suitable online Ontology resources. However, after some experiments (basically focused on finding RDFs files), we found it does not perform as expected; it is very hard to use Google to search for suitable ontology files. There are several problems: Firstly, ontologies are not always available for a particular topic/domain. Some domains have many resources while others have very few.

Secondly, Google returns links of relevant files, and the user will have to check if they are really relevant. This can be very time consuming because Google does not offer a quick way to browse ontology files. Last but not least, Google searches files based on keywords supplied by the users. It does not check the real content and structure of the files. Some (usually many) irrelevant files will be returned to the user, just because they have the keywords somewhere in their files. We quite often find many RDFs files, which contain the required keywords, but on further examination of the ontology, we realized that the files do not match our needs at all; that is, they do have the required keywords, but they are not situated as required. For example, when we searched for a food ontology using the keyword concept “Food”, ontologies about the Animal domain are also returned, because the file contains a statement, such as “animal food vegetarian”. Obviously, it is not really what we want. This kind of “mistakes” can cost the user more time to find acceptable ontologies. Thus, Google’s keyword searching is not good enough as an ontology search tool. Google Web APIs are a free beta service to help programmers develop their own Google-based applications. With the Google Web APIs service, software Developers can query more than 4 billion web pages directly from their own computer programs. Google uses SOAP14 and WSDL15 standards so a developer can program in his or her favorite environment, such as Java, Perl16, or Visual Studio .NET.17 So, with the support of Google Web APIs, we can develop a more specific tool to search for user-required ontologies from the Semantic Web.

3. Empirical Studies

3.1 Design of Onto Search

As mentioned in the last section, finding ontologies to satisfy user requirements is a very important issue, in both KB reuse and Semantic Web areas. There is no existing tool to solve this problem. Google does have the power, but does not seem to be specific enough to give good results.

After some experiments, we noticed that the problem arises because Google does not offer a good visualization function for the ontology files (in different formalisms, such as RDFs, etc.). As the user cannot view the ontology in an intuitive graphic format, they have to look through the ontologies as structured text files. This process takes a lot of time and cannot guarantee a good result, as the plain text of the ontology cannot show the internal structure of the ontology clearly.

After reviewing some Ontology tools, we find that showing the hierarchy (structure) of ontology is very important to help the user to understand the nature of the ontology. Most of the tools offer a facility of hierarchy viewing to support the user to build and edit ontologies. A hierarchical view of ontology seems to be a good way to give the user a quick overview of the selected ontology. In this piece of work, we investigate the applicability of visualization techniques for ontology searching on the Internet. To answer this question, we developed a visualization tool, Onto Search, which combines the Google search engine together with the RDFs ontology (hierarchy) visualization technology. It helps the user search for relevant (based on keywords) ontology files on the Internet and displays the files in a visually appealing way—a Hierarchy tree. The hierarchical view allows users to quickly review the structures of different ontology files and select the relevant ontology files. We show a diagrammatic overview of Onto Search in Figure 2.

The user inputs to Onto Search the keywords to describe the nature of the required ontology. Then Onto Search applies the Google engine to search for RDFs files related to the keywords and returns a list of relevant links (URLs) to the user. The user then chooses some of the returned RDFs files and displays their structure, and decides which of the files are relevant. Finally, the users select the relevant RDFs files and save them in a taxonomy library for future use. As we now have the ontology searching tool Onto Search, we can link it to our other tool IKB.

3.2 Development of OntoSearch

The Onto Search system is implemented in Java and JSP19. It is a web-based system, which can offer online service based on JBoss20. Jena, a Java API for manipulating RDF models, is used to read the ontology (RDFs file) into Java. Google Web APIs contribute to the Internet search engine. One JSP tag is applied to visualize the hierarchy structure of the ontology.

The user can browse and use the Onto Search interface using any web browser. The user inputs keywords to describe the nature of the required ontology on the keyboard. Then, Onto Search will apply the Google Web APIs to search the Internet for relevant files (the file type is restricted as RDFs now but can be changed) and return all the URLs on the screen. The user can select the files to inspect their structures in a hierarchy tree view. Thus, the user can get a general idea of the content and structure of the returned ontologies. Finally, the user can save the relevant ontology on local disk.

3.3 Demonstration of Onto Search

Next, an example of using Onto Search is given. Suppose the user is looking for some ontology in a Food domain. The required ontology should contain some real world knowledge about food and related issues. The user inputs the keyword “Food” into Onto Search. After searching, some RDFs files are returned as results, which are shown in Figure 4.

Figure 4: Search ontologies by keywords

As often many RDFs files are returned, the user then has to inspect them to check if these files are really about the Food domain. As there is one file named “Food.RDFs”, the user selects that one first. The content of that RDFs file is shown as triples in Figure 5. As often many RDFs files are returned, the user then has to inspect them to check if these files are really about the Food domain. As there is one file named “Food.RDFs”, the user selects that one first. The content of that RDFs file is shown as triples in Figure. There is only one kind of triple in this ontology. All the triples are “subClassOf” type of triple. All the concepts in this ontology are subclasses (within several levels) of the food concept. Thus, we can think this ontology is a hierarchy of different kinds of foods. In Fact, this ontology file does match the user’s needs. Figure 6 gives the hierarchy of that ontology. Obviously, this format is much easier for the user to understand than the triple format which is shown in Figure 5.

Figure 6: Hierarchy visualization of selected ontology

After viewing the hierarchy of the select ontology, the user makes the decision whether the ontology is relevant to the requirement, and then proceeds to check further returned ontologies.

Prototype

This model of annotation-crawling/inference guides us in developing various KM applications. In this on-going research, we are developing a prototype for demonstration purpose. We have defined the application as a simple skill management system for human resource department in an

IT company. The application and the ontology developer may implement other kinds of application and other domain ontology according to specific needs and business domains. There are many researches on the ontology development. We have developed the skill ontologies, which conceptualize and describe an IT company human resource skill structures. A snapshot of the ontologies is shown in Figure. A piece of codes following shows Skill ontology representation using DAML+OIL is as follows where the name space is “xmlns:daml_oil.

Figure: Skill Ontology

4. Summary, Discussion and Future Work

As mentioned earlier, the Onto Search system is a useful tool which can search for ontology files from the Internet and visualize them as hierarchies. The next stage of our work will be developing an advanced mode for Onto Search system: The current Onto Search system is quite simple. It can only search for one type (RDFs) of ontology file, and it only compares the user keywords with the contents of the ontology files wherever they occur. And so it matches indiscriminately the keywords both from concepts and comment fields. A future version of Onto Search Figure 6: Hierarchy visualization of selected ontology will allow the user to choose different representational formalisms used to express ontologies, and it will allow the user to specify the type of entity (concepts, attribute or comments, etc.) to be matched. Other future work includes:

Creating a “library” of the Taxonomies

More experiments will be carried out, especially on particular domains to test our Onto Search system. The user-acceptable ontologies will be stored in a repository for future use (e.g. for use with IKB).

WordNet22 application

The synonym problem is not well addressed in the current version of Onto Search. We are planning to incorporate WorldNet in future versions so that our tool will be more effective, i.e. it will retrieve a large number of relevant ontologies.

References

1. Advanced Knowledge Technology (AKT project) http://www.aktors.org/akt/

2. Sleeman D, Potter S, Robertson D, and Schorlemmer W.M. Enabling Services for Distributed Environments: Ontology Extraction and Knowledge Base Characterisation, ECAI-2002 workshop, 2002

3. Sleeman D, Zhang Yi, Vasconcelos W. Characterisation of Knowledge Bases, Proceedings of AI-2003 (the twenty-third Annual International Conference of the British Computer Society's Specialist Group on Artificial Intelligence (SGAI)), 2003

4. Schorlemmer M, Potter S, and Robertson D. Automated Support for Composition of Transformational Components in Knowledge Engineering. Informatics Research Report EDI-INF-RR-0137, June, 2002.

Search This Blog

Paul's Blog

ONTOSEARCH: AN ONTOLOGY SEARCH ENGINE

Comments

Post a Comment

Popular posts from this blog

Drawing National flag using Java Applet

CELog and Kernel Tracker

WinCE Essentials Volume and File Control