|A Multi-Agent System to Support Exploiting an XML-based Corporate Memory
INRIA, ACACIA Project, 2004 Route des Lucioles, 06902 Sophia Antipolis, France
A corporate memory and the World Wide Web have in common that they are both heterogeneous and distributed information landscapes. They also share the same problem of relevance of results when one wants to search them. However, compared to the Web, a corporate memory has a delimited and better defined context, infrastructure and scope : the corporation. Taking into account the characteristics of a corporate memory we show in this paper the assets of an approach combining XML technology designed for the Web and the distributed nature of multi-agent systems. In particular, we consider the heterogeneity and distribution of the multi-agent system as a solution to the heterogeneity and the distribution of the corporate memory.
The information overload and the inefficiency of keyword-based search engines on the Web are problems widely acknowledged. The "Semantic Web" is a promising approach where the semantics of documents is made explicit through metadata and annotations to guide later exploitation. Ontobroker [Dec99], Shoe [Hef99] WebKB [Mar99] and OSIRIX [Rab00] are examples of this metadata technique, relying on annotation based on ontologies. In parallel there is an increasing industrial interest in the capitalization of corporate knowledge leading to the development and deployment of knowledge management techniques in more and more companies. The coherent integration of this dispersed knowledge in a corporation is called a corporate memory. It has the objective to "promote knowledge growth, promote knowledge communication and in general preserve knowledge within an organization" [Ste93]. Corporate memory projects are facing the same problem of relevance as Web search engines when retrieving documents because the information landscape of a company is also a distributed and heterogeneous set of resources. Therefore, it seems interesting to consider a distributed and heterogeneous system to explore and exploit this information landscape such as a Multi-Agent System (MAS). The purpose is to allow the information sources to remain localized and heterogeneous in terms of storage and maintenance, while enabling the company to capitalize an integrated and global view of its corporate memory. The MAS approach allows users to be assisted by software agents usually distributed over the network. These agents have different skills and roles trying to support or automate some tasks: they may be dedicated to interfacing the user with the system, managing communities, processing or archiving data, etc. Our objective is to build and organize a corporate memory to ease the search inside it and the use of its content by members of the organization. This memory contains unstructured, semi-structured or fully-structured data. The importance of relying on standards that are widely accepted led us to use XML technology for exchanges and storage [Rab00]. The XML technology enables us to build a structure around the data, and RDF (Resource Description Framework) allows us to improve search mechanisms using semantics of annotations. In this paper we show that an approach combining XML and MAS technologies, offers a lot of advantages for corporate memory management. In the first section we will introduce the specificity of a corporate memory project and present the CoMMA project we are involved in that led us to study agent systems. The second part will describe the aspect of XML we are interested in and the prototype CORESE [Cor00] we developed to search annotation bases. The third section will present in details the current results of our investigations on multi-agent systems applied to corporate memory with the architecture and the roles we identified so far.
2Context of Intervention
2.1Stakes and Specificity of a Corporate Memory Management System
We define a corporate memory (CM) as an explicit, disembodied and persistent representation of knowledge and information in an organization, in order to facilitate their access and reuse by members of the organization, for their tasks [Rab00]. Compared to the World Wide Web, a corporate memory has a delimited scope: the corporation. Therefore we can precisely identify the stakeholders (e.g.: information providers) and moreover this community shares some common global views of the world (e.g.: company policy, best practices) and thus an ontological commitment is conceivable. The corporation also has its own organization and infrastructure. From a knowledge engineering point of view this means that besides the user's model, an enterprise model can be obtained through a data-collection phase, both models being based on an ontology specific to the corporate memory management task. The user models characterize the different roles and profiles of the stakeholders and are used to customize the interactions and the behavior of the system. The enterprise model presents organizational aspects such as organization charts, processes, documents, and so on. The two models are obviously linked and tangled. They will be used to annotate and search the corporate memory in a user-friendly and efficient fashion. Some organizational aspects are hidden but important for the systems, for example the fact that the organization chart and the acquaintance network do not take into account transversal groups such as "communities of interest" may lead to a functionality that supports the emergence of such communities when they are known to exist but are not precisely identified. Another example is the fact that the intranet infrastructure and network resources policy results in an heterogeneous and distributed set of information sources that changes from one company to another and therefore the system has to be modular enough to cope with this constraint.
2004 Route des Lucioles
04 92 38 77 00
Figure 1. XML example
he CoMMA Project
The ACACIA research team, which we belong to, is part of the CoMMA consortium. CoMMA (Corporate Memory Management through Agents) is an IST project [CoM00] funded by the European Commission, which started in February 2000. The main objective of the project is to implement and test a Corporate Memory management framework integrating several emerging technologies: agent technology, knowledge modeling, XML technology, information retrieval and machine learning techniques. The project intends to implement the system in the context of two scenarios:
The insertion of new employees in the company.
The support of technology monitoring processes.
The solution proposed in CoMMA is based on a MAS architecture of cooperating agents, being able to adapt to the user, to the context, and supporting retrieval of relevant information in the CM. These agents will be able to communicate with the others to delegate tasks, and to make elementary reasoning and decisions, supporting choices between several documents. They will have inference mechanisms exploiting ontologies. They may help authors to annotate documents, to perform technological monitoring on the Internet and to circulate the acquired innovative ideas to the interested employees of the company. The project focuses on the case where the corporate memory is materialized by XML documents and annotated by meta-information in RDF in order to offer intelligent search functionalities and improve document retrieval. We also intend to exploit machine learning techniques in order to make agents adaptive to their users and context. In CoMMA, the realization of the MAS will be simplified by using a pre-existing software framework for the development of agent applications called JADE [Berg00] compliant with the FIPA specifications [FIP97]. Integration of these technologies in one system is already a challenge, yet another is the definition of the methodology supporting the whole design process. In the process of proposing an architecture for the MAS, we have been led to think about the characteristics of a multi-agent system applied to the exploitation of corporate memory from a general point of view; Section 4 presents our first results.
3Principles and Motivations of this New Approach to Corporate Memory
3.1XML and MAS: Metadata Approach
The eXtensible Markup Language (XML) is a description language recommended by the World Wide Web Consortium for creating and accessing structured data and documents in text format over internet-based networks. The XML syntax uses start and end tags to mark up information elements (for example and in Figure 1). Elements may be further enriched by attaching name-value pairs called attributes (for example, country="FR" in Figure 1). Its simple syntax is easy to process by machine, and has the attraction of remaining understandable to humans. XML makes it possible to deliver information to agents in a form that allows automatic processing after receipt and therefore distribute the processing load over the MAS. It is also a standard, and therefore a good candidate to exchange data and build a cooperation between heterogeneous and distributed sources which is exactly the type of problems tackled by multi-agent information systems adopting, for instance, the wrapper agents approach. XML is extensible: one can define new tags and attribute names to parameterize or semantically qualify data and documents. Structures can be nested to any level of complexity so database schemas or object-oriented hierarchies can be represented. Moreover, the set of elements, attributes, entities and notations that can be used within an XML document instance can optionally be formally defined in a document type definition (DTD) embedded, or referenced, within the document. The DTD gives the names of the elements and attributes, the allowed sequence and nesting of tags, the attribute values and their types and defaults, etc. The main reason to explicitly define the language is that documents can be checked to conform to it. Therefore once a template has been issued, one can establish a common format and check whether or not the documents placed in the corporate memory are valid. Figure 2 presents a DTD corresponding to the XML example of Figure 1. Unfortunately the semantics of the tags cannot be described in a DTD. However if an agent knows the semantics, it can use the metadata and infer from it to help the users of the corporate memory. The semantics must be shared to allow cooperation among the agents and unambiguous exchanges; ontologies are a keystone of multi-agent systems. By describing the meaning of the actual content, structure description will help an agent find relevant information and enable matchmaking between producer and consumer agents. Unlike HTML, XML tags describe the structure of the data, rather than the presentation. Content structure and display format are completely independent. The eXtensible Stylesheet Language (XSL) can be used for expressing style sheets, which have document manipulation capabilities beyond styling. Thus a document of the corporate memory can be viewed differently and transformed into other documents to adapt to the need and the profile of the agents and the users while being stored and transferred in a unique format. Figure 3 presents a style sheet extracting the name and the phone number from the document given in Figure 1. The output of this style sheet is an HTML file given in figure 4. The ability to dissociate structure content and presentation enables the corporate memory documents to be used and viewed in different ways. Therefore XML has a lot of assets to materialize company documents and further forthcoming features of XML will complement this aspect:
Figure 2. DTD example