jCOLIBRI2 Tutorial - GAIA – Group of Artificial Intelligence Applications
Transcription
jCOLIBRI2 Tutorial - GAIA – Group of Artificial Intelligence Applications
jCOLIBRI2 Tutorial Juan A. Recio-García Belén Díaz-Agudo Pedro González-Calero September 16, 2008 Document version 1.2 GROUP FOR ARTIFICIAL INTELLIGENCE APPLICATIONS UNIVERSIDAD COMPLUTENSE DE MADRID This work is supported by the MID-CBR project of the Spanish Ministry of Education & Science TIN2006-15140-C03-02 and the G.D. of Universities and Research of the Community of Madrid (UCM-CAM-910494 research group grant). jCOLIBRI2 Tutorial Juan A. Recio-García Belén Díaz-Agudo Pedro González Calero Technical Report IT/2007/02 Department of Software Engineering and Artificial Intelligence. University Complutense of Madrid ISBN: 978-84-691-6204-0 Contents 3 Contents 1 Introduction 8 2 What is Case-Based Reasoning? 2.1 The CBR cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 3 The jCOLIBRI CBR framework 11 3.1 jCOLIBRI2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Getting started with jCOLIBRI2 14 5 The Travel Recommender CBR application 17 6 Importing jCOLIBRI2 into Eclipse 24 6.1 Preparing the Travel Recommender files . . . . . . . . . . . . . . . . . 25 7 Creating the CBR application 28 8 The Travel Recommender Case Base 33 9 Representing the Cases 9.1 Case Components . . . . . . . . . . . . . . . . 9.2 Cases and Queries in jCOLIBRI2 . . . . . . . . 9.3 Case Components of the Travel Recommender . 9.4 Defining new attribute types . . . . . . . . . . 35 35 36 39 41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 The two layers persistence architecture of jCOLIBRI2 42 10.1 The Connectors of jCOLIBRI2 . . . . . . . . . . . . . . . . . . . . . . 42 10.2 In-memory organization of the cases . . . . . . . . . . . . . . . . . . . 44 11 Loading the Case Base 46 11.1 Configuring the connector . . . . . . . . . . . . . . . . . . . . . . . . 47 11.1.1 The Hibernate configuration file . . . . . . . . . . . . . . . . . 48 11.1.2 Creating the mapping files . . . . . . . . . . . . . . . . . . . . 50 12 Using Ontologies in CBR applications 12.1 Case Base persistence in ontologies . . . . . . 12.2 Computing similarities using ontologies . . . . 12.3 Case and Query vocabulary . . . . . . . . . . . 12.4 Using an ontology in the Travel Recommender . . . . 13 Retrieval 13.1 Similarity functions . . . . . . . . . . . . . . . . 13.2 Cases selection . . . . . . . . . . . . . . . . . . 13.3 Using the k-NN retrieval . . . . . . . . . . . . . 13.4 Retrieval in the Travel Recommender application Group for Artificial Intelligence Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 55 55 56 58 . . . . 61 62 63 64 66 jCOLIBRI2 Tutorial Contents 4 14 Reuse 67 14.1 Adapting the Travel Recommender retrieved trips . . . . . . . . . . . . 67 15 Revise 69 15.1 Travel Recommender revision . . . . . . . . . . . . . . . . . . . . . . 69 16 Retain 70 16.1 Saving the new trips of the Travel Recommender . . . . . . . . . . . . 70 17 Shutting down a CBR application 71 18 Textual CBR 18.1 Semantic retrieval . . . . . . . . . . . . . . . . . . . . . . . . . 18.1.1 Representation of the texts . . . . . . . . . . . . . . . . 18.1.2 IE methods implementation . . . . . . . . . . . . . . . 18.1.3 Computing similarity . . . . . . . . . . . . . . . . . . . 18.1.4 The Restaurant Recommender example . . . . . . . . . 18.2 Statistical retrieval . . . . . . . . . . . . . . . . . . . . . . . . 18.2.1 The Restaurant Recommender using statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 72 73 75 76 77 81 82 19 Evaluation of CBR applications 83 20 Recommenders 20.1 Templates guided design of recommendation systems 20.1.1 One-Off Preference Elicitation . . . . . . . . 20.1.2 Retrieval . . . . . . . . . . . . . . . . . . . 20.1.3 Iterated Preference Elicitation . . . . . . . . 20.2 Methods for recommender systems . . . . . . . . . . 20.2.1 User interaction methods . . . . . . . . . . . 20.2.2 New Nearest Neighbor similarity measures . 20.2.3 Conditional methods . . . . . . . . . . . . . 20.2.4 Navigation by Asking methods . . . . . . . . 20.2.5 Navigation by Proposing . . . . . . . . . . . 20.2.6 Profile management methods . . . . . . . . . 20.2.7 Collaborative Recommendations . . . . . . . 20.2.8 Retrieval methods . . . . . . . . . . . . . . 20.2.9 Cases selection . . . . . . . . . . . . . . . . 86 86 87 89 89 90 90 91 92 92 93 94 94 95 97 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Other Features 99 21.1 Visualization of a Case Base . . . . . . . . . . . . . . . . . . . . . . . 99 21.2 Classification and Maintenance . . . . . . . . . . . . . . . . . . . . . . 99 22 Getting support 101 23 Contributing to jCOLIBRI 102 23.1 Required elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 23.2 Example applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial Contents 5 23.3 How to submit a contribution . . . . . . . . . . . . . . . . . . . . . . . 103 24 Versions ChangeLog 104 24.1 Version 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 24.1.1 What’s new in Version 2.1 . . . . . . . . . . . . . . . . . . . . 104 24.2 Version 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial List of Figures 6 List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 CBR cycle by Aamodt and Plaza . . . . . . . . . . . . . . . . Two layers architecture of jCOLIBRI2 . . . . . . . . . . . . . jCOLIBRI2 short-cuts under windows . . . . . . . . . . . . . jCOLIBRI2 Tester . . . . . . . . . . . . . . . . . . . . . . . jCOLIBRI2 Tests . . . . . . . . . . . . . . . . . . . . . . . . Define Query step . . . . . . . . . . . . . . . . . . . . . . . . Configure Similarity step . . . . . . . . . . . . . . . . . . . . Retrieved Cases step . . . . . . . . . . . . . . . . . . . . . . Adaptation step . . . . . . . . . . . . . . . . . . . . . . . . . Revise Cases step . . . . . . . . . . . . . . . . . . . . . . . . Retain Cases step . . . . . . . . . . . . . . . . . . . . . . . . Importing the jCOLIBRI 2 project into Eclipse . . . . . . . . . The jCOLIBRI 2 project in Eclipse . . . . . . . . . . . . . . . . Files to delete in the project . . . . . . . . . . . . . . . . . . . Project state to begin the tutorial . . . . . . . . . . . . . . . . Cases representation UML diagram . . . . . . . . . . . . . . Case Base management in jCOLIBRI2 . . . . . . . . . . . . . Travel Recommender components mapping . . . . . . . . . . Case mapping in a ontology . . . . . . . . . . . . . . . . . . OntologyConnector behavior example . . . . . . . . . . . . . Concept based similarity functions in jCOLIBRI2 . . . . . . Example of application of the similarity functions . . . . . . . Defining the Travel Recommender query through an ontology Mapping between data base and ontology . . . . . . . . . . . Representation of texts for IE. . . . . . . . . . . . . . . . . . Global view of the representation of texts for IE. . . . . . . . . Common organization of textual cases . . . . . . . . . . . . . Evaluation of a CBR application . . . . . . . . . . . . . . . . Single Shot template . . . . . . . . . . . . . . . . . . . . . . Conversational templates . . . . . . . . . . . . . . . . . . . . Preference Elicitation decompositions . . . . . . . . . . . . . Visualization of a Case Base . . . . . . . . . . . . . . . . . . Group for Artificial Intelligence Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . 12 . 15 . 15 . 16 . 18 . 19 . 20 . 21 . 22 . 23 . 24 . 25 . 26 . 27 . 39 . 42 . 50 . 55 . 56 . 57 . 58 . 59 . 60 . 74 . 75 . 76 . 84 . 87 . 88 . 88 . 100 jCOLIBRI2 Tutorial Listings 7 Listings 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 StandardCBRApplication interface . . . . . . . . . . . . . . . . . . . . 28 TravelRecommender initial code . . . . . . . . . . . . . . . . . . . . . 29 TravelRecommender singleton . . . . . . . . . . . . . . . . . . . . . . 29 TravelRecommender GUI code . . . . . . . . . . . . . . . . . . . . . . 30 TravelRecommender main() method . . . . . . . . . . . . . . . . . . . 31 Travel Recommender data base schema . . . . . . . . . . . . . . . . . 33 TravelRecommender configure() method (version 1) . . . . . . . . . . . 33 CaseComponent interface . . . . . . . . . . . . . . . . . . . . . . . . . 35 Bean example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Using Attribute example code . . . . . . . . . . . . . . . . . . . . . . 36 CBRQuery code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 CBRCase code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 TravelDescription initial code . . . . . . . . . . . . . . . . . . . . . . . 39 TravelSolution initial code . . . . . . . . . . . . . . . . . . . . . . . . 40 TypeAdaptor interface . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Connector interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 CBRCaseBase interface . . . . . . . . . . . . . . . . . . . . . . . . . . 44 TravelRecommender configure() (version 2) and precycle() . . . . . . . 46 databaseconfig.xml . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 hibernate.cfg.xml . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 TravelSolution.hbm.xml . . . . . . . . . . . . . . . . . . . . . . . . . 50 TravelDescription.hbm.xml . . . . . . . . . . . . . . . . . . . . . . . . 51 Mapping template for user-defined types . . . . . . . . . . . . . . . . . 52 Mapping template for enumerates . . . . . . . . . . . . . . . . . . . . 52 TravelRecommender configure() method (final version) . . . . . . . . . 58 NNScoringMethod signature . . . . . . . . . . . . . . . . . . . . . . . 61 Local and Global similarity interfaces . . . . . . . . . . . . . . . . . . 62 Cases selection methods . . . . . . . . . . . . . . . . . . . . . . . . . 63 Example code for the k-NN Retrieval . . . . . . . . . . . . . . . . . . 64 TravelRecommender.cycle() code (step1) . . . . . . . . . . . . . . . . 66 TravelRecommender.cycle() code (step2) . . . . . . . . . . . . . . . . 68 TravelRecommender.cycle() code (step3) . . . . . . . . . . . . . . . . 69 TravelRecommender.cycle() code (step4) . . . . . . . . . . . . . . . . 70 TravelRecommender.postCycle() code . . . . . . . . . . . . . . . . . . 71 The Restaurant Recommender precycle using semantic TCBR methods 78 The Restaurant Recommender cycle using semantic TCBR methods . . 79 The Restaurant Recommender using statistical TCBR methods . . . . . 82 Evaluation code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Example of an evaluable application . . . . . . . . . . . . . . . . . . . 85 Visualization code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Config file for the Examples application . . . . . . . . . . . . . . . . . 103 Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 1. Introduction 8 1 Introduction Case-based reasoning (CBR) has became a mature and established subfield of Artificial Intelligence (AI), both as a mean for addressing AI problems and as a basis for fielded AI technology. Now that CBR fundamental principles have been established and numerous applications have demonstrated CBR is an useful technology, many researchers agree about the increasing necessity to formalise this kind of reasoning, define application analysis methodologies, and provide a design and implementation assistance with software engineering tools [5, 4, 27, 16, 20]. While the underlying ideas of CBR can be applied consistently across application domains, the specific implementation of the CBR methods –in particular retrieval and similarity functions– is highly customised to the application at hand. Two factors have became critical: the availability of tools to build CBR systems, and the accumulated practical experience of applying CBR techniques to real-world problems. Our work goes along all these increasing necessities. We have designed a tool to help application designers to develop and quickly prototyping CBR systems. Besides we want to provide a software tool useful for students who have little experience with the development of different types of CBR systems. jCOLIBRI has been designed as a wide spectrum framework able to support several types of CBR systems from the simple nearest-neighbor approaches based on flat or simple structures to more complex Knowledge Intensive ones. It also supports the development of textual and conversational CBR applications [41, 21]. Other features of the framework like: ontology integration, visualization of case bases, evaluation of CBR applications, classification and maintenance methods, ... will be explained in this tutorial. The framework implementation is evolving as new methods are included. Our (ambitious) goal is to provide a reference framework for CBR development that would grow with contributions from the community. We invite the jCOLIBRI developers to send us their methods to enrich the functionality of the framework and make them available to the whole CBR community. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 2. What is Case-Based Reasoning? 9 2 What is Case-Based Reasoning? Case-based reasoning is a paradigm for combining problem-solving and learning that has became one of the most successful applied subfield of AI of recent years. CBR is based on the intuition that problems tend to recur. It means that new problems are often similar to previously encountered problems and, therefore, that past solutions may be of use in the current situation [12]. CBR is rooted in the works of Roger Schank on dynamic memory and the central role that a reminding of earlier episodes (cases) and scripts (situation patterns) has in problem solving and learning [48]. CBR is particularly applicable to problems where earlier cases are available, even when the domain is not understood well enough for a deep domain model. Helpdesks, diagnosis or classification systems have been the most successful areas of application, e.g., to determine a fault or diagnostic an illness from observed attributes, or to determine whether or not a certain treatment or repair is necessary given a set of past solved cases [54]. Central tasks that all CBR methods have to deal with are [2]: "to identify the current problem situation, find a past case similar to the new one, use that case to suggest a solution to the current problem, evaluate the proposed solution, and update the system by learning from this experience. How this is done, what part of the process that is focused, what type of problems that drives the methods, etc. varies considerably, however". For a detailed description on these and other CBR related aspects, we address the interested reader to detailed surveys about the CBR field ([2],[13],[32], [12]) and to the last european and international conferences in the field (ECCBR and ICCBR). 2.1 The CBR cycle At the highest level of generality, a general CBR application can be described by a cycle composed of the following four processes[2]: • RETRIEVE the most similar case or cases. • REUSE the information and knowledge in that case to solve the problem. • REVISE the proposed solution. • RETAIN the parts of this experience likely to be useful for future problem solving. A new problem is solved by retrieving one or more previously experienced cases, reusing the case in one way or another, revising the solution based on reusing a previous case, and retaining the new experience by incorporating it into the existing knowledgebase (case-base). In Figure 1, this cycle is illustrated. An initial description of a problem (top of figure) defines the query. This new case is used to RETRIEVE a case from the collection of previous cases. The retrieved case is combined with the new case - through REUSE - into a solved case, i.e. a proposed solution to the initial problem. Through the REVISE process this solution is tested for Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 2.1 The CBR cycle 10 Figure 1: CBR cycle by Aamodt and Plaza success, e.g. by being applied to the real world environment or evaluated by a teacher, and repaired if failed. During RETAIN (or REMEMBER), useful experience is retained for future reuse, and the case base is updated by a new learned case, or by modification of some existing cases. As indicated in the figure, general knowledge usually plays a part in this cycle, by supporting the CBR processes. This support may range from very weak (or none) to very strong, depending on the type of CBR method. By general knowledge we mean general domain-dependent knowledge, as opposed to specific knowledge embodied by cases. For example, in diagnosing a patient by retrieving and reusing the case of a previous patient, a model of anatomy together with causal relationships between pathological states may constitute the general knowledge used by a CBR system. A set of rules may have the same role. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 3. The jCOLIBRI CBR framework 11 3 The jCOLIBRI CBR framework jCOLIBRI is an object-oriented framework in Java for building CBR systems that is an evolution of previous work on knowledge intensive CBR [16, 18]. The first version of our software (COLIBRI1 )was prototyped in LISP and was far from being usable outside of our own research group. However we learnt good lessons from it, and we have designed jCOLIBRI as a technological evolution of COLIBRI that incorporates the advantages of the software technologies developed during the last few years. The jCOLIBRI framework has two major releases: jCOLIBRI version 1 : jCOLIBRI version 1 is the first release of the framework. It includes a complete Graphical User Interface that guides the user in the design of a CBR system. This version is recommended for non-developer users that want to create CBR systems without programming any code. jCOLIBRI version 2 : jCOLIBRI version 2 is a new implementation that follows a new and clear architecture divided into two layers: one oriented to developers (finished) and other oriented to designers (future work). This new design is a complete white box framework (see below) open to java developers that want to include the features of jCOLIBRI in their CBR applications. This tutorial shows how to develop a CBR application using the jCOLIBRI version 2 (jCOLIBRI 2 ahead). For a reference guide of the first version read [45]. 3.1 jCOLIBRI2 Architecture jCOLIBRI 2 is the result of the experience acquired during the development of the first version. It solves many drawbacks like case representation, management of metadata, development problems, etc. But the architecture of this new version is very different (although compatible) as is based on a complete revision of CBR systems and frameworks. There are many definitions of framework, but the one that is probably most referenced is found in [28]: A framework is a set of classes that embodies an abstract design for solutions to a family of related problems. In other words, a framework is a partial design and implementation for an application in a given problem domain. 1 Cases and Ontology Libraries Integration for Building Reasoning Infrastructures Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 3.1 jCOLIBRI2 Architecture 12 For classifying a framework we can distinguish two types. A white-box framework is reused mostly by subclassing and a black-box framework is reused through parametrization. The usual development of a framework begins with a design as a white-box architecture that evolves into a black-box one. The resulting black-box framework has an associated builder that will generate the application’s code. The visual builder allows the software designer to connect the framework objects and activate them. The first version of jCOLIBRIis closer to a black-box framework with visual builder and lacks of a clear white- box structure. The new design of jCOLIBRI 2 attempts to remodel the architecture into a clear whitebox system oriented to programmers, and a black-box with builder layer that is oriented to designers. The key idea in the new design consists of separating core classes and user interface. That separation will give us the two layers architecture shown in the following figure: Figure 2: Two layers architecture of jCOLIBRI2 The bottom layer contains the basic components of the framework with well defined and clear interfaces. This layer does not contain any kind of graphical tool for developing CBR applications; it is simply a white-box object-oriented framework that must be used by programmers. The top layer contains semantic descriptions of the components and several tools that aid users in the development of CBR applications (black-box with visual builder framework). The bottom layer has new features that solve most of the problems identified in the first version. It takes advantage of the new possibilities offered by the newest versions of the Java language. The most important change is the representation of cases as Java Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 3.1 jCOLIBRI2 Architecture 13 Beans. The persistence of cases is managed by the Hibernate package. Hibernate is a Java Data Objects (JDO) implementation, so it can automatically store Java Beans in a relational data base, using one or more tables. Java Beans and Hibernate are core technologies in the Java 2 Enterprise Edition platform that is oriented to business applications. Using these technologies in jCOLIBRI 2 we guarantee the future development of commercial CBR applications using this framework. All these features will be explained in this tutorial. Regarding the top layer, it is oriented to designers and includes several tools with graphical interfaces. This layer is the black-box version of the framework, helping users to develop complete CBR applications guiding the configuration process. The top layer is still under development, so this tutorial explains how to use the bottom layer of jCOLIBRI 2. A detailed explanation of the features and motivations of the jCOLIBRI 2 architecture can be found in [44]. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 4. Getting started with jCOLIBRI2 14 4 Getting started with jCOLIBRI2 The framework can be downloaded from the web page: http://gaia.fdi.ucm.es/projects/jcolibri Or directly from the sourceforge.net project page: http://sourceforge.net/projects/jcolibri-cbr/ There are three main versions of the framework: v1.1, v2.0 and v2.1. This tutorial covers versions 2.0 and 2.1. These two versions are very similar. jCOLIBRI 2.1 extends jCOLIBRI 2.0 including the recommenders package explained in Section 20. There are also a few changes detailed in Section 24. In this tutorial we will refer to both versions as jCOLIBRI2. Make sure that you download the jCOLIBRI 2.1 version (not the previous ones). If you download the windows version, a wizard will guide you during the installation process. Or if you use any other operative system just unzip the file in any folder. The jCOLIBRI 2 release contains the following files: • ./jCOLIBRI2-Tester.bat : The tester application for Windows. • ./jCOLIBRI2-Tester.sh : The tester application for Unix. • ./README.txt : This file. • ./.project : Eclipse project file that can be imported into that IDE. • ./.classpath : Eclipse classpath file with the libraries used in jCOLIBRI. • ./build.xml : Apache Ant build file to compile the sources. • ./jcolibri2.LICENSE : License agreement. • ./src/jcolibri : The souce code of the jCOLIBRI package. • ./doc/api/index.html : The index to the javadoc of the jCOLIBRI package. • ./doc/uml/ : UML class and sequence diagrams. • ./doc/configfilesSchemas/ : XML schemas for the configuration files of the connectors used to stores persistent data • ./lib : jCOLIBRI as well as third-party libraries included in this release. • ./lib/jcolibri2.jar : Compiled jCOLIBRI jar file. • ./TravelRecommender : Folder for the Travel Recommender example. • ./TravelRecommender.bat : The Travel Recommender for Windows. • ./TravelRecommender.sh : The Travel Recommender for Unix. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 4. Getting started with jCOLIBRI2 15 jCOLIBRI 2 requires that Java (1.6 or later) is installed on your computer. Once the framework is installed, you can launch the framework examples through the start menu (only windows O.S.). Figure 3: jCOLIBRI2 short-cuts under windows These short cuts link to the files located in the installation directory. There is an application tester that can be launched using the following scripts: • In Windows: jCOLIBRI2-Tester.bat • In UNIX: jCOLIBRI2-Tester.sh The tester let you run each one of the 30 test applications included in the release to exemplify the main features of jCOLIBRI. In the Tester you can access to the javadocs of the tests which are the main source of documentation as shown in Figure 4. Figure 4: jCOLIBRI2 Tester In the Code Examples section of the jCOLIBRI web page (http://gaia.fdi.ucm. es/projects/jcolibri/jcolibri2/examples.html), a table summarizes Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 4. Getting started with jCOLIBRI2 16 which features are explained by each test (see Figure 5). This table is also shown through the Map button of the Tester application. The first 16 examples illustrate the behavior of general CBR application meanwhile the following 14 show how to implement recommender systems. The reading of these tests is recommended while studying this tutorial as many sections cite them for details. Figure 5: jCOLIBRI2 Tests There is also an example of using the jCOLIBRI 2 jar library in a stand-alone application named "Travel Recommender". It can be launched using the following scripts: • In Windows: TravelRecommender.bat • In UNIX: TravelRecommender.sh This tutorial will show you how to develop the Travel Recommender application following some simple steps. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 5. The Travel Recommender CBR application 17 5 The Travel Recommender CBR application The Travel Recommender is a well known application example in the CBR community. Therefore, in this tutorial we have chosen this application to show how to implement CBR applications using jCOLIBRI 2. Following sections guide the readers in the development process of this application in an incremental way. Before starting with the implementation, this section describes the features and behavior of the Travel Recommender application. In this application, the case base is composed of several trips. The system receives a desired trip as a query, compares the query with the trips in the case base using a similarity function, and returns the most similar ones. After the retrieval, these most similar cases can be adapted according with the restrictions of the query. Each case is represented by several attributes: Id, holiday type, number of persons, region, transportation, duration, season, accommodation type, price and hotel. In our application, the price and the hotel will be considered as the solution of the case meanwhile the remaining attributes will be the case description. This way, our cases are divided into: a description used to retrieve similar cases given a query, and a solution that is adapted depending on the values of the query. Lets look at the different steps of our CBR application. You can execute it through the short-cuts in the windows installation (see Figure 3 or the TravelRecommender.bat and TravelRecommender.sh scripts located in the installation folder. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 5. The Travel Recommender CBR application 18 Define Query. In this step the user defines her query to the system. She has to define which are the values of the different attributes of a trip: duration, season, transportation, etc. Figure 6: Define Query step Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 5. The Travel Recommender CBR application 19 Configure Similarity. Here the user configures the similarity measure used to retrieve the cases most similar to the query. jCOLIBRI 2 implements several similarity functions that can be used depending on the type of the attribute (integers, strings, etc.). Moreover, developers can define their own similarity measures. You can also assign a weight to each attribute of the query that will be taken into account when computing the average of all attributes. Also, some similarity functions can have parameters used to configure the similarity measure. Finally, the k value indicates how many cases must be retrieved. We are using an algorithm named K Nearest Neighbor (k-NN) that computes the similarity of the query with all the cases and then orders the result depending on that similarity value. Then the first k most similar cases are returned. Figure 7: Configure Similarity step Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 5. The Travel Recommender CBR application 20 Retrieved Cases. This step shows the documents retrieved by the k-NN method. It shows each case and its similarity to the query. Figure 8: Retrieved Cases step Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 5. The Travel Recommender CBR application 21 Adaptation. In this step the system adapts the retrieved cases to the requirements of the user depending on the values defined in the query. This stage use to be very domain dependent and will be different in other CBR systems. In our Travel Recommender application we will adapt the price of the trips depending on the number of persons and duration defined in the query. Our system will use simple direct proportions to perform the adaptation. For example, imagine that the retrieved case has a duration of 7 days and costs 1000. If the user is looking for a 14 days trip we have to adapt the price using a direct proportion: if a 7 days trip costs 1000, then a 14 days trip costs 2000. This process is also repeated for the number of persons in an analogous way. After adapting the solution of the cases, their description can be substituted by the description of the query. At this point, the system will manage a list of working cases that are different from the cases in the case base. These working cases represent possible solutions to the problem described in the query. Figure 9: Adaptation step Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 5. The Travel Recommender CBR application 22 Revise Cases. Once cases have been adapted, the user (or a domain expert, in this case the trip agent) would adjust the values of the working cases in a manual way. For example, imagine that the hotel has not available rooms and clients have to go to a similar one. Another situation is that retrieved cases are not similar enough to the requirements of the query and the trip agent has to define manually the solution of the case. Figure 10: Revise Cases step Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 5. The Travel Recommender CBR application 23 Retain Cases. Finally, the trip agent would save the new trip case into the case base for being used in future queries. If it is done, a new Id must be assigned to the trip. Figure 11: Retain Cases step Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 6. Importing jCOLIBRI2 into Eclipse 24 6 Importing jCOLIBRI2 into Eclipse To develop a new CBR application with jCOLIBRI 2 we recommend the Eclipse IDE (www.eclipse.org). jCOLIBRI 2 includes the required files to import the framework project into Eclipse: .classpath and .project. But if you import the whole project, you will have the source files of the framework into your Eclipse project. The smart solution consists on loading only the jcolibri2.jar file (and related libraries) into your project. That way, you will have only your source files in the Eclipse project. For teaching purposes, in this tutorial we will load the whole jCOLIBRI project into Eclipse to allow an easier navigation through the source code of the framework. To avoid modifications of the source files of your installation, we will make a copy of the project into the Eclipse workspace when importing the project To import jCOLIBRI 2 into Eclipse use the File - Import menu. Then choose the “General - Existing projects into workspace” option and finally select the installation folder of the framework into the “Select root directory” field. Make sure that the “Copy projects into workspace” option is checked and click on “Finish”. Figure 12: Importing the jCOLIBRI 2 project into Eclipse Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 6.1 Preparing the Travel Recommender files 25 Once the project is imported, you can navigate through its contents and source files using the "‘Package Explorer"’ Figure 13: The jCOLIBRI 2 project in Eclipse 6.1 Preparing the Travel Recommender files The complete source code of the travel recommender application is located into the “example” subfolder of the framework and is compressed as travelrecommender-source.zip. In the following sections we will create the most important source files from scratch. However, we will not create the code related with the GUI of the application because it is out of the scope of this tutorial. So, you must decompress the contents of the zip file into the “src” subfolder of your jCOLIBRI 2 Eclipse project (not the original installation folder). This project will be located into your Eclipse workspace directory. Make sure that you preserve the folder names when unziping the file. Once done, come back to Eclipse, select the jCOLIBRI 2 project into the “Package Explorer”, and refresh it using the contextual menu option or the F5 key. Then a new subpackage named jcolibri.examples.TravelRecommender will appear the source folder. As explained before we only keep the GUI files. During the tutorial, we are doing the rest of the files step by step. So, select the following files and delete them: TravelDescription.java, TravelRecommender.java, TravelSolution.java, databaseconfig.xml, hibernate.cfg.xml, TravelDescription.hbm.xml, and TravelSolution.hbm.xml. (They are shown in Figure 14). Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 6.1 Preparing the Travel Recommender files 26 Figure 14: Files to delete in the project The contents of your project before beginning with the tutorial is also shown in Figure 15. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 6.1 Preparing the Travel Recommender files 27 Figure 15: Project state to begin the tutorial Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 7. Creating the CBR application 28 7 Creating the CBR application A CBR application in jCOLIBRI must implement or extend jcolibri.cbrapplications.StandardCBRApplication interface: the Listing 1: StandardCBRApplication interface public interface StandardCBRApplication { /∗ ∗ ∗ C o n f i g u r e s t h e a p p l i c a t i o n : c a s e base , c o n n e c t o r s , e t c . ∗ @throws E x e c u t i o n E x c e p t i o n ∗/ p u b l i c v o i d c o n f i g u r e ( ) throws E x e c u t i o n E x c e p t i o n ; /∗ ∗ ∗ Runs t h e p r e c y l e where t y p i c a l l y c a s e s a r e r e a d and ∗ organized i n t o a case base . ∗ @return The c r e a t e d c a s e b a s e w i t h t h e c a s e s i n t h e ∗ storage . ∗ @throws E x e c u t i o n E x c e p t i o n ∗/ p u b l i c CBRCaseBase p r e C y c l e ( ) throws E x e c u t i o n E x c e p t i o n ; /∗ ∗ ∗ E x e c u t e s a CBR c y c l e w i t h t h e g i v e n q u e r y . ∗ @throws E x e c u t i o n E x c e p t i o n ∗/ p u b l i c v o i d c y c l e ( CBRQuery q u e r y ) throws E x e c u t i o n E x c e p t i o n ; /∗ ∗ ∗ Runs t h e c o d e t o s h u t d o w n t h e a p p l i c a t i o n . T y p i c a l l y ∗ i t closes the connector . ∗ @throws E x e c u t i o n E x c e p t i o n ∗/ p u b l i c v o i d p o s t C y c l e ( ) throws E x e c u t i o n E x c e p t i o n ; } This interface divides the CBR application behavior into 3 steps: • Precycle: Initializes the CBR application, usually loading the case base and precomputing expensive algorithms (really useful when working with texts). It is executed only once. • Cycle: Executes the CBR cycle. It is executed many times. • Postcycle: Post-execution or maintenance code. Now, lets create the main class of our Travel Recommender. The name of the class must be TravelRecommender.java of the package Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 7. Creating the CBR application jcolibri.examples.TravelRecommender. method in the class: 29 Include also the main() Listing 2: TravelRecommender initial code p u b l i c c l a s s TravelRecommender implements StandardCBRApplication { p u b l i c v o i d c o n f i g u r e ( ) throws E x e c u t i o n E x c e p t i o n { } p u b l i c v o i d c y c l e ( CBRQuery q u e r y ) throws E x e c u t i o n E x c e p t i o n { } p u b l i c v o i d p o s t C y c l e ( ) throws E x e c u t i o n E x c e p t i o n { } p u b l i c CBRCaseBase p r e C y c l e ( ) throws E x e c u t i o n E x c e p t i o n { } p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) { } } To ease the access to this class from the different GUI frames and to ensure that there is only one instance of this class we are going to implement a singleton pattern. Include the following code into the class: Listing 3: TravelRecommender singleton p r i v a t e s t a t i c TravelRecommender _ i n s t a n c e = n u l l ; public s t a t i c TravelRecommender g e t I n s t a n c e ( ) { i f ( _ i n s t a n c e == n u l l ) _ i n s t a n c e = new TravelRecommender ( ) ; return _ i n s t a n c e ; } p r i v a t e TravelRecommender ( ) { } This way, the unique instance of the TravelRecommender class must be accessed through the getInstance() method. As the application uses several frames we need a main frame that acts as the parent of all of them. We are going to create a simple main frame that only contains the jCOLIBRI 2 logo. Then we will have a separated frame for each step of the application. Include the following code in the class: Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 7. Creating the CBR application 30 Listing 4: TravelRecommender GUI code ... SimilarityDialog similarityDialog ; ResultDialog resultDialog ; AutoAdaptationDialog autoAdaptDialog ; RevisionDialog revisionDialog ; RetainDialog retainDialog ; ... p u b l i c v o i d c o n f i g u r e ( ) throws E x e c u t i o n E x c e p t i o n { try { / / Create the dialogs s i m i l a r i t y D i a l o g = new S i m i l a r i t y D i a l o g ( main ) ; resultDialog = new R e s u l t D i a l o g ( main ) ; a u t o A d a p t D i a l o g = new A u t o A d a p t a t i o n D i a l o g ( main ) ; revisionDialog = new R e v i s i o n D i a l o g ( main ) ; retainDialog = new R e t a i n D i a l o g ( main ) ; } catch ( Exception e ) { throw new E x e c u t i o n E x c e p t i o n ( e ) ; } } ... s t a t i c JFrame main ; v o i d showMainFrame ( ) { main = new JFrame ( " T r a v e l Recommender " ) ; main . s e t R e s i z a b l e ( f a l s e ) ; main . s e t U n d e c o r a t e d ( t r u e ) ; J L a b e l l a b e l = new J L a b e l ( new I m a g e I c o n ( j c o l i b r i . u t i l . F i l e I O . f i n d F i l e ( " / j c o l i b r i / t e s t / main / j c o l i b r i 2 . j p g " ) )); main . g e t C o n t e n t P a n e ( ) . add ( l a b e l ) ; main . p a c k ( ) ; D i m e n s i o n s c r e e n S i z e = j a v a . awt . T o o l k i t . getDefaultToolkit () . getScreenSize () ; main . s e t B o u n d s ( ( s c r e e n S i z e . w i d t h − main . g e t W i d t h ( ) ) / 2, ( s c r e e n S i z e . h e i g h t − main . g e t H e i g h t ( ) ) / 2 , main . g e t W i d t h ( ) , main . g e t H e i g h t ( ) ) ; main . s e t V i s i b l e ( t r u e ) ; } This code shows an useful class of jCOLIBRI 2 named jcolibri.util.FileIO. This class finds a file in the src, bin or main folders of the project even if it is packaged into a jar file. This is very useful if your project must access to external files. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 7. Creating the CBR application 31 Now lets create the main method of the Travel Recommender application. It configures the application, executes the precycle (only once), executes the cycle several times and finally calls the postcycle code: Listing 5: TravelRecommender main() method p u b l i c s t a t i c v o i d main ( S t r i n g [ ] a r g s ) { / / Obtain TravelRecommender o b j e c t TravelRecommender recommender = g e t I n s t a n c e ( ) ; / / Show t h e main f r a m e recommender . showMainFrame ( ) ; try { / / Configure the application recommender . c o n f i g u r e ( ) ; / / Execute the Precycle recommender . p r e C y c l e ( ) ; / / Create th e frame t h a t o b t a i n s th e query Q u e r y D i a l o g q f = new Q u e r y D i a l o g ( main ) ; / / Main CBR c y c l e boolean cont = true ; while ( cont ) { / / Show t h e q u e r y f r a m e qf . s e t V i s i b l e ( true ) ; / / Obtain the query CBRQuery q u e r y = q f . g e t Q u e r y ( ) ; / / Call the cycle recommender . c y c l e ( q u e r y ) ; / / Ask i f c o n t i n u e i n t ans = j a v a x . swing . JOptionPane . showConfirmDialog ( null , "CBR c y c l e f i n i s h e d , q u e r y a g a i n ? " , " C y c l e f i n i s h e d " , j a v a x . s w i n g . J O p t i o n P a n e . YES_NO_OPTION ) ; c o n t = ( a n s == j a v a x . s w i n g . J O p t i o n P a n e . YES_OPTION ) ; } / / Execute p o s t c y c le recommender . p o s t C y c l e ( ) ; } catch ( Exception e ) { / / Errors o r g . a p a c h e . commons . l o g g i n g . L o g F a c t o r y . g e t L o g ( TravelRecommender . c l a s s ) . e r r o r ( e ) ; j a v a x . swing . JOptionPane . showMessageDialog ( null , e . getMessage ( ) ) ; } Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 7. Creating the CBR application 32 System . e x i t ( 0 ) ; } This code shows how to access the unique instance of the TravelRecommender class through the getInstance() method. Then the main frame is shown. After that the CBR application is configured and the precycle is executed. That code is executed only once. Then we create the frame that obtains the query: QueryDialog. As explained before, this class must receive the main frame in the constructor. This frame (shown in Figure 6) returns the query defined by the user. jCOLIBRI 2 defines a class to store the queries named jcolibri.cbrcore.CBRQuery. It will be explained later in Section 9.2. Finally, the code shows that jCOLIBRI 2 uses the Apache Log4j library to manage the logging messages. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 8. The Travel Recommender Case Base 33 8 The Travel Recommender Case Base The case base of the application is stored in a Data Base. jCOLIBRI 2 can manage any Data Base Manager (DBM) because it uses internally the Hibernate (www.hibernate.org) middleware. Hibernate is a powerful, high performance object/relational persistence and query service. This project and a critical component of broadly used JBoss J2EE server supporting persistence in Data Bases or XML files among many other features like its own object-oriented query language. Using Hibernate, jCOLIBRI 2 allows the use of any DBM like Oracle, MySQL, etc. This way, you can reuse any data base to create your case base for a CBR application. The Travel Recommender application uses the data base defined in the travel.sql file located in the jcolibri.example.TravelRecommender package. You can use that file to generate the data base in your favorite DBM. This file generates a data base named travel that contains a table also named travel. The table contains 1024 trips and is defined with the following schema: Listing 6: Travel Recommender data base schema create table t r a v e l ( c a s e I d VARCHAR( 1 5 ) , H o l i d a y T y p e VARCHAR( 2 0 ) , P r i c e INTEGER, NumberOfPersons INTEGER, R e g i o n VARCHAR( 3 0 ) , T r a n s p o r t a t i o n VARCHAR( 3 0 ) , D u r a t i o n INTEGER, S e a s o n VARCHAR( 3 0 ) , Accommodation VARCHAR( 3 0 ) , H o t e l VARCHAR( 5 0 ) ) ; To simplify the use of the examples, jCOLIBRI 2 includes the HSQLDB Data Base Manager (www.hsqldb.org). This DBM is completely implemented in Java and can be easily included in any other Java project. This way, HSQLDB is used by the examples of the framework and by the Travel Recommender application. To launch the DBM you can use the jcolibri.test.database.HSQLDBserver class. Its init() method initializes the DBM loading the data bases used by the examples. This initialization also includes the tables used by our Travel Recommender defined in travel.sql. Now you have to add the following code into the TravelRecommender class to lauch and stop the DBM: Listing 7: TravelRecommender configure() method (version 1) p u b l i c v o i d c o n f i g u r e ( ) throws E x e c u t i o n E x c e p t i o n { Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 8. The Travel Recommender Case Base 34 try { / / Emulate data base s e r v e r j c o l i b r i . t e s t . d a t a b a s e . HSQLDBserver . i n i t ( ) ; / / Create the dialogs ... } catch ( Exception e ) { throw new E x e c u t i o n E x c e p t i o n ( e ) ; } } p u b l i c v o i d p o s t C y c l e ( ) throws E x e c u t i o n E x c e p t i o n { _connector . close () ; j c o l i b r i . t e s t . d a t a b a s e . HSQLDBserver . shutDown ( ) ; } Note that this code is not the final code for those methods. They will be extended in following sections. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 9. Representing the Cases 35 9 Representing the Cases Now that we have the data base that contains the cases, we have to represent them in our system. This section details how cases are represented inside a jCOLIBRI 2 CBR application. jCOLIBRI 2 represents the cases using Java Beans. A Java Bean is any class that has a get() and set() method for each public attribute. Its modification and management can be performed automatically using a Java technology called Introspection (that is completely transparent for the developer). Using Java Beans in jCOLIBRI 2, developers can design their cases as normal Java classes, choosing the most natural design. This simplifies programming and debugging the CBR applications, and the configuration files became simpler because most of the metadata of the cases can be extracted using Introspection. Java Beans also offer automatically generated user interfaces that allow the modification of their attributes and automatic persistence into data bases and XML files. It is important to note that every Java web application uses Java Beans as a base technology, so the development of web interfaces is very straightforward. Moreover, Hibernate uses Java Beans to store the information into a data base. Java Beans and Hibernate are core technologies in the Java 2 Enterprise Edition platform that is oriented to business applications. Using these technologies in jCOLIBRI 2 we guarantee the future development of commercial CBR applications using this framework. 9.1 Case Components The unique restriction of the Java Beans that compose a case is that they must define an Id that identifies them in the table of the DBM. This restriction is defined in the jcolibri.cbrcore.CaseComponent interface: Listing 8: CaseComponent interface p u b l i c i n t e r f a c e CaseComponent { /∗ ∗ ∗ R e t u r n s t h e a t t r i b u t e t h a t i d e n t i f i e s t h e component . ∗ An i d − a t t r i b u t e m u s t be u n i q u e f o r e a c h c o m p o n e n t . ∗/ j c o l i b r i . cbrcore . Attribute getIdAttribute () ; } The jcolibri.cbrcore.Attribute class represents an attribute of a Java Bean. For example, if you have a Java Bean like this: Listing 9: Bean example code p u b l i c c l a s s MyBean{ int data ; int getData ( ) { return data ; } Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 9.2 Cases and Queries in jCOLIBRI2 36 void s e t D a t a ( i n t _data ) { this . data = _data ; } } You can create an Attribute object that represent the data attribute of MyBean. It is done with the following code: Listing 10: Using Attribute example code A t t r i b u t e a t = new A t t r i b u t e ( " d a t a " , MyBean . c l a s s ) ; The Attribute class allows jCOLIBRI 2 to manage the contents of the cases (Java Beans). 9.2 Cases and Queries in jCOLIBRI2 In jCOLIBRI 2 cases and queries are composed of CaseComponents. A case will be divided into four components: • Description of the problem. • Solution to the problem. • Result of applying the solution. • Justification of the Solution (why this solution was chosen). These components were chosen after reviewing the applications detailed in [54], but they are not required in some applications, so the developer can leave them empty (only the description component is mandatory). Every query has always a description, so the jcolibri.cbrcore.CBRQuery object defines the queries as: Listing 11: CBRQuery code p u b l i c c l a s s CBRQuery{ CaseComponent d e s c r i p t i o n ; /∗ ∗ ∗ R e t u r n s t h e d e s c r i p t i o n component . ∗/ p u b l i c CaseComponent g e t D e s c r i p t i o n ( ) { return d e s c r i p t i o n ; } /∗ ∗ ∗ S e t s t h e d e s c r i p t i o n component . ∗/ p u b l i c v o i d s e t D e s c r i p t i o n ( CaseComponent d e s c r i p t i o n ) { Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 9.2 Cases and Queries in jCOLIBRI2 37 this . description = description ; } /∗ ∗ ∗ R e t u r n s t h e ID a t t r i b u t e o f t h e Query / Case t h a t i s ∗ t h e ID a t t r i b u t e o f i t s d e s c r i p t i o n c o m p o n e n t . ∗/ public Object getID ( ) { i f ( t h i s . d e s c r i p t i o n == n u l l ) return null ; else try { return d e s c r i p t i o n . g e t I d A t t r i b u t e ( ) . getValue ( description ) ; } catch ( A t t r i b u t e A c c e s s E x c e p t i o n e ) { return null ; } } public String t o S t r i n g ( ) { return " [ D e s c r i p t i o n : "+ d e s c r i p t i o n +" ] " ; } } If a query contains a description, we can say that a case is a query plus solution, result and justification of the solution. This way, the jcolibri.cbrcore.CBRCase extends jcolibri.cbrcore.CBRQuery adding those components: Listing 12: CBRCase code p u b l i c c l a s s CBRCase e x t e n d s CBRQuery { CaseComponent s o l u t i o n ; CaseComponent j u s t i f i c a t i o n O f S o l u t i o n ; CaseComponent r e s u l t ; /∗ ∗ ∗ Returns the j u s t i f i c a t i o n O f S o l u t i o n . ∗/ p u b l i c CaseComponent g e t J u s t i f i c a t i o n O f S o l u t i o n ( ) { return j u s t i f i c a t i o n O f S o l u t i o n ; } /∗ ∗ ∗ S e t s t h e J u s t i f i c a t i o n o f S o l u t i o n component . ∗ @param j u s t i f i c a t i o n O f S o l u t i o n t o s e t . ∗/ Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 9.2 Cases and Queries in jCOLIBRI2 38 p u b l i c v o i d s e t J u s t i f i c a t i o n O f S o l u t i o n ( CaseComponent justificationOfSolution ) { this . justificationOfSolution = justificationOfSolution ; } /∗ ∗ ∗ Returns the r e s u l t . ∗/ p u b l i c CaseComponent g e t R e s u l t ( ) { return r e s u l t ; } /∗ ∗ ∗ S e t s t h e R e s u l t component ∗/ p u b l i c v o i d s e t R e s u l t ( CaseComponent r e s u l t ) { this . result = result ; } /∗ ∗ ∗ Returns the solution . ∗/ p u b l i c CaseComponent g e t S o l u t i o n ( ) { return s o l u t i o n ; } /∗ ∗ ∗ S e t s t h e s o l u t i o n component ∗/ p u b l i c v o i d s e t S o l u t i o n ( CaseComponent s o l u t i o n ) { this . solution = solution ; } public String t o S t r i n g ( ) { return super . t o S t r i n g ( ) +" [ S o l u t i o n : "+ s o l u t i o n +" ] [ Sol . J u s t . : "+ j u s t i f i c a t i o n O f S o l u t i o n +" ] [ R e s u l t : "+ r e s u l t +" ] " ; } } Each CaseComponent bean can have attributes that also are CaseComponents beans. This way, developers can create case structures with nested attributes. This feature is shown in the Test 3 of the examples. The UML diagram in Figure 16 shows the relationship between cases, queries and casecomponents. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 9.3 Case Components of the Travel Recommender 39 Figure 16: Cases representation UML diagram 9.3 Case Components of the Travel Recommender The cases of the Travel Recommender application are composed by a description and a solution. The description contains the Id, holiday type, number of persons, region, transportation, duration, season and accommodation type attributes. And the solution contains the Id, hotel and price attributes. To create the description of the case you have to implement a Java Bean that contains those attributes and obeys the CaseComponent interface. This class is named TravelDescription and must be located under jcolibri.example.TravelRecommender. This is a portion of the code: Listing 13: TravelDescription initial code p u b l i c c l a s s T r a v e l D e s c r i p t i o n implements j c o l i b r i . c b r c o r e . CaseComponent { p u b l i c enum AccommodationTypes { O n e S t a r , TwoStars , ThreeStars , HolidayFlat , FourStars , FiveStars }; p u b l i c enum S e a s o n s { J a n u a r y , F e b r u a r y , March , A p r i l , May , June , J u l y , August , S e p t e m b e r , O c t o b e r , November , December } ; String caseId ; String HolidayType ; I n t e g e r NumberOfPersons ; I n s t a n c e Region ; String Transportation ; Integer Duration ; Seasons Season ; AccommodationTypes Accommodation ; Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 9.3 Case Components of the Travel Recommender 40 public A t t r i b u t e g e t I d A t t r i b u t e ( ) { r e t u r n new A t t r i b u t e ( " c a s e I d " , t h i s . g e t C l a s s ( ) ) ; } public String t o S t r i n g ( ) { r e t u r n " ( " + c a s e I d + " ; " + H o l i d a y T y p e + " ; " + NumberOfPersons + " ; "+Region+" ; "+ T r a n s p o r t a t i o n +" ; "+ D u r a t i o n +" ; "+Season + " ; " +Accommodation+ " ) " ; } ... Now you have to add a get() and set() method for each attribute. But, don’t worry because Eclipse does it automatically selecting the menu item: “Source - Generate Getters and Setters...”. In this description we are using two enumerate types to define the accommodation and season. Also, there is an strange type named Instance that defines the region. This type is defined in jcolibri.datatypes.Instance and represents an instance of an ontology. Now, we are not going into detail with this because it will be explained in Section 12.4. The solution bean must be created in a similar way. Its name is TravelSolution and this is the code without the getters and setters (that you have to generate): Listing 14: TravelSolution initial code public class TravelSolution CaseComponent { implements j c o l i b r i . c b r c o r e . String id ; Integer price ; String hotel ; public String t o S t r i n g ( ) { return " ( "+ id +" ; "+ p r i c e +" ; "+ h o t e l +" ) " ; } public A t t r i b u t e g e t I d A t t r i b u t e ( ) { r e t u r n new A t t r i b u t e ( " i d " , t h i s . g e t C l a s s ( ) ) ; } ... Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 9.4 Defining new attribute types 41 9.4 Defining new attribute types Sometimes, the Java built-in types are not enough and the developer has to define her own ones. This is the case of the Region attribute of the TravelDescription bean that uses a custom data type named Instance. When defining a new type the developer has to specify how it is going to be stored and in the persistence media. This is done by implementing the jcolibri.connectors.TypeAdaptor interface: Listing 15: TypeAdaptor interface public i n t e r f a c e TypeAdaptor { /∗ ∗ ∗ Returns a s t r i n g representation of the type . ∗/ public abstract String t o S t r i n g ( ) ; /∗ ∗ ∗ Reads t h e t y p e f r o m a s t r i n g . ∗/ p u b l i c a b s t r a c t v o i d f r o m S t r i n g ( S t r i n g c o n t e n t ) throws Exception ; /∗ ∗ ∗ You m u s t d e f i n e t h i s method t o a v o i d p r o b l e m s w i t h t h e data base connector ( Hibernate ) ∗/ public a b s t r a c t boolean e q u a l s ( Object o ) ; } jCOLIBRI 2 includes two examples of user-defined data types in the jcolibri.datatypes package. Moreover, the Test 2 of the examples explains this mechanism in detail. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 10. The two layers persistence architecture of jCOLIBRI2 42 10 The two layers persistence architecture of jCOLIBRI2 The CaseComponents defined in the previous section are used to fill the attributes of the CBRCase objects when loading the cases. Before explaining this mechanism we have to depict the two layers persistence architecture of jCOLIBRI 2. CBR systems must access the stored cases in an efficient way, a problem that becomes more relevant as the size of the Case Base grows. jCOLIBRI (both versions 1 and 2) splits the problem of Case Base management in two separate although related concerns: persistence mechanism and in-memory organization. This architecture is illustrated in Figure 17. Figure 17: Case Base management in jCOLIBRI2 10.1 The Connectors of jCOLIBRI2 Persistence is built around connectors. A connector represents the first layer of jCOLIBRI on top of the physical storage. Connectors are objects that know how to access and retrieve cases from the medium and return those cases to the CBR system in a uniform way. The use of connectors give jCOLIBRI flexibility against the physical storage so the system designer can choose the most appropriate one for the system at hand. jCOLIBRI 2 includes the following connectors: • jcolibri.connectors.DataBaseConnector. Manages the persistence of cases into data bases. Internally, it uses the Hibernate library. • jcolibri.connectors.PlainTextConnector. Manages the persistence of the cases into textual files. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 10.1 The Connectors of jCOLIBRI2 • jcolibri.connectors.OntologyConnector. bases stored into ontologies. 43 It uses OntoBridge2 to manage case The obvious interface for a connector must include methods to read the Case Base into memory and update it back into persistent media. More specifically, jCOLIBRI 2 includes an interface named Connector that belongs to package jcolibri.cbrcore. Every connector is supposed to implement the methods defined by this interface: Listing 16: Connector interface public i n t e r f a c e Connector { /∗ ∗ ∗ I n i t i a l i c e s t h e c o n n e c t o r w i t h t h e g i v e n XML f i l e ∗/ p u b l i c v o i d i n i t F r o m X M L f i l e ( j a v a . n e t . URL f i l e ) throws InitializingException ; /∗ ∗ ∗ C l e a n u p any r e s o u r c e t h a t t h e c o n n e c t o r m i g h t be u s i n g , ∗ and s u s p e n d s t h e s e r v i c e ∗/ public void c l o s e ( ) ; /∗ ∗ ∗ S t o r e s g i v e n c l a s s e s on t h e s t o r a g e media ∗/ p u b l i c v o i d s t o r e C a s e s ( C o l l e c t i o n <CBRCase> c a s e s ) ; /∗ ∗ ∗ D e l e t e s g i v e n c a s e s f o r t h e s t o r a g e media ∗/ p u b l i c v o i d d e l e t e C a s e s ( C o l l e c t i o n <CBRCase> c a s e s ) ; /∗ ∗ ∗ R e t u r n s a l l t h e c a s e s i n t h e s t o r a g e media ∗/ p u b l i c C o l l e c t i o n <CBRCase> r e t r i e v e A l l C a s e s ( ) ; /∗ ∗ ∗ R e t r i e v e s some c a s e s d e p e n d i n g on t h e f i l t e r . TODO . ∗/ p u b l i c C o l l e c t i o n <CBRCase> r e t r i e v e S o m e C a s e s ( C a s e B a s e F i l t e r filter ); } 2 OntoBridge (http://gaia.fdi.ucm.es/projects/ontobridge/) is a library developed by the GAIA team that eases the management of ontologies. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 10.2 In-memory organization of the cases 44 Connectors are configured through XML configuration files. Each jCOLIBRI 2 connector defines the XML schema of its configuration file. These schemes can be found in the documentation. An interface such that assumes that the whole Case Base can be read into memory for the CBR processes to work with it. However, in a real sized CBR application this approach may not be feasible. For that reason, we are working to extend connector interface to retrieve those cases that satisfy a query expressed in a subset of SQL (retrieveSomeCases(CaseBaseFilter)). This way the designer can decide what part of the Case Base is loaded into memory. If a developer requires a specific connector, she can create her own one extending the Connector interface. This is shown in the Test 13 of the code examples of the framework. 10.2 In-memory organization of the cases The second layer of Case Base management is the data structure used to organize the cases once loaded into memory. The organization of the case base (linear, k-d trees, case retrieval nets, etc.) may have a big influence on the CBR processes, so the framework leaves open the election of the data structure that is going to be used. In the same way connectors offer a common interface to the Case Base, the Case Base also implements a common interface for the CBR methods to access the cases. This way the organization and indexation chosen for the Case Base will not affect the implementation of the methods. The interface that defines the in-memory organization of the cases is jcolibri.cbrcore.CBRCaseBase: Listing 17: CBRCaseBase interface p u b l i c i n t e r f a c e CBRCaseBase { /∗ ∗ ∗ I n i t i a l i z e s t h e case base . This methods r e c i e v e s ∗ t h e c o n n e c t o r t h a t manages t h e p e r s i s t e n c e media . ∗/ p u b l i c v o i d i n i t ( C o n n e c t o r c o n n e c t o r ) throws j c o l i b r i . exception . InitializingException ; /∗ ∗ ∗ De− I n i t i a l i z e s t h e c a s e b a s e . ∗/ public void c l o s e ( ) ; /∗ ∗ ∗ R e t u r n s a l l t h e c a s e s a v a i l a b l e on t h i s c a s e b a s e ∗/ p u b l i c C o l l e c t i o n <CBRCase> g e t C a s e s ( ) ; Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 10.2 In-memory organization of the cases 45 /∗ ∗ ∗ R e t u r n s some c a s e s d e p e n d i n g on t h e f i l t e r ∗/ p u b l i c C o l l e c t i o n <CBRCase> g e t C a s e s ( C a s e B a s e F i l t e r f i l t e r ) ; /∗ ∗ ∗ Adds a c o l l e c t i o n o f new CBRCase o b j e c t s t o ∗ the c u r r e n t case base ∗/ p u b l i c v o i d l e a r n C a s e s ( C o l l e c t i o n <CBRCase> c a s e s ) ; /∗ ∗ ∗ Removes a c o l l e c t i o n o f new CBRCase o b j e c t s t o t h e c u r r e n t case base ∗/ p u b l i c v o i d f o r g e t C a s e s ( C o l l e c t i o n <CBRCase> c a s e s ) ; Analogous to the Connector interface, developers can create their in-memory organizations of cases implementing the CBRCaseBase interface. jCOLIBRI 2 includes the following Case Bases: • jcolibri.casebase.LinealCaseBase: Basic Lineal Case Base that stores cases into a List. • jcolibri.casebase.CachedLinealCaseBase: Cached case base that only persists cases when closing the application. • jcolibri.casebase.IDIndexedLinealCaseBase: Extension of LinealCaseBase that also keeps an index of cases using their IDs. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 11. Loading the Case Base 46 11 Loading the Case Base Now, let’s add the code for loading the cases of our Travel Recommender application. We are using the Data Base connector (implemented with Hibernate) and the Lineal Case Base. The first step consists on modifying our TravelRecommender class with the following code: Listing 18: TravelRecommender configure() (version 2) and precycle() ... /∗ ∗ Connector o b j e c t ∗/ Connector _connector ; / ∗ ∗ CaseBase o b j e c t ∗ / CBRCaseBase _ c a s e B a s e ; ... p u b l i c v o i d c o n f i g u r e ( ) throws E x e c u t i o n E x c e p t i o n { try { / / Emulate data base s e r v e r j c o l i b r i . t e s t . d a t a b a s e . HSQLDBserver . i n i t ( ) ; / / Create a data base connector _ c o n n e c t o r = new D a t a B a s e C o n n e c t o r ( ) ; / / I n i t t h e ddbb c o n n e c t o r w i t h t h e c o n f i g f i l e _connector . initFromXMLfile ( j c o l i b r i . u t i l . FileIO . f i n d F i l e ( " j c o l i b r i / examples / TravelRecommender / d a t a b a s e c o n f i g . xml " ) ) ; / / C r e a t e a L i n e a l c a s e b a s e f o r i n −memory organization _ c a s e B a s e = new L i n e a l C a s e B a s e ( ) ; / / Create the dialogs ... } catch ( Exception e ) { throw new E x e c u t i o n E x c e p t i o n ( e ) ; } p u b l i c CBRCaseBase p r e C y c l e ( ) throws E x e c u t i o n E x c e p t i o n { / / Load c a s e s f r o m c o n n e c t o r i n t o t h e c a s e b a s e _caseBase . i n i t ( _connector ) ; / / Print the cases j a v a . u t i l . C o l l e c t i o n <CBRCase> c a s e s = _ c a s e B a s e . getCases () ; f o r ( CBRCase c : c a s e s ) System . o u t . p r i n t l n ( c ) ; return _caseBase ; } Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 11.1 Configuring the connector 47 Firstly, we are creating the two variables that will contain the connector and case base: _connector and _caseBase. They are defined with the type of the interfaces Connector and CBRCaseBase, but in the configure() method we will assign an instance of DataBaseConnector and LinealCaseBase that implement these interfaces. As explained before, each connector can use a xml file that defines its configuration. In this case, we are using the file jcolibri/examples/TravelRecommender/databaseconfig.xml to configure the Data Base connector. This file is explained in the following subsection. In the preCycle() method we initializes the case base object with the connector through the init() method. This action will load the cases from the persistence into the memory. Then we can access the cases in the case base object (here to print them to console). 11.1 Configuring the connector The data base connector is configured with the databaseconfig.xml file. So, next step consists on creating this file in the jcolibri.examples.TravelRecommender package. The schema of this file is explained in the documentation of the connector. Here you should write this: Listing 19: databaseconfig.xml <DataBaseConfiguration> <HibernateConfigFile> j c o l i b r i / e x a m p l e s / TravelRecommender / h i b e r n a t e . c f g . xml </ HibernateConfigFile> <DescriptionMappingFile> j c o l i b r i / e x a m p l e s / TravelRecommender / T r a v e l D e s c r i p t i o n . hbm . xml </ DescriptionMappingFile> <DescriptionClassName> j c o l i b r i . e x a m p l e s . TravelRecommender . T r a v e l D e s c r i p t i o n < / DescriptionClassName> <SolutionMappingFile> j c o l i b r i / e x a m p l e s / TravelRecommender / T r a v e l S o l u t i o n . hbm . xml </ SolutionMappingFile> <SolutionClassName> j c o l i b r i . e x a m p l e s . TravelRecommender . T r a v e l S o l u t i o n < / SolutionClassName> </ DataBaseConfiguration> This file defines the following information: • HibernateConfigFile: locates the configuration file of Hibernate. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 11.1 Configuring the connector 48 • DescriptionMappingFile: locates the mapping file of the description. • DescriptionClassName: class that represents the solution. • SolutionMappingFile: locates the mapping file of the solution. • SolutionClassName: class that represents the description. The solution and description classes are the ones developed in Section 9.3. Using these fields, the connector knows which are the CaseComponents that fill the description, solution, justificationOfSolution and result attributes of the CBRCase. The mapping files define how to map those classes into the tables of the database. And finally, the Hibernate configuration file configures the connection with the DBMS. 11.1.1 The Hibernate configuration file This file tells Hibernate how to access the DBMS. If you need special requirements read the Hibernate documentation located here: http://www.hibernate.org/hib_ docs/v3/reference/en/html/session-configuration.html To configure Hibernate to work with our HSQLDB Data Base Manager you must create the file hibernate.cfg.xml into the jcolibri.examples.TravelRecommender package. Following listing shows the content: Listing 20: hibernate.cfg.xml <? xml v e r s i o n = " 1 . 0 " e n c o d i n g = "UTF−8" ? > < !DOCTYPE h i b e r n a t e −c o n f i g u r a t i o n PUBLIC " −// H i b e r n a t e / H i b e r n a t e C o n f i g u r a t i o n DTD 3 . 0 / / EN" " h t t p : / / h i b e r n a t e . s o u r c e f o r g e . n e t / h i b e r n a t e −c o n f i g u r a t i o n − 3 . 0 . dtd "> < h i b e r n a t e −c o n f i g u r a t i o n > < s e s s i o n −f a c t o r y > < !−− D a t a b a s e c o n n e c t i o n s e t t i n g s −−> < p r o p e r t y name= " c o n n e c t i o n . d r i v e r _ c l a s s " > org . hsqldb . j d b c D r i v e r </ property> < p r o p e r t y name= " c o n n e c t i o n . u r l " > jdbc:hsqldb:hsql: // localhost / travel </ property> < p r o p e r t y name= " c o n n e c t i o n . u s e r n a m e " > sa </ property> < p r o p e r t y name= " c o n n e c t i o n . p a s s w o r d " > </ property> Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 11.1 Configuring the connector 49 < !−− JDBC c o n n e c t i o n p o o l ( u s e t h e b u i l t −i n ) −−> < p r o p e r t y name= " c o n n e c t i o n . p o o l _ s i z e " > 1 </ property> < !−− SQL d i a l e c t −−> < p r o p e r t y name= " d i a l e c t " > o r g . h i b e r n a t e . d i a l e c t . HSQLDialect </ property> < !−− E n a b l e H i b e r n a t e a u t o m a t i c s e s s i o n c o n t e x t management −−> < p r o p e r t y name= " c u r r e n t _ s e s s i o n _ c o n t e x t _ c l a s s " > thread </ property> < !−− D i s a b l e t h e s e c o n d −l e v e l c a c h e −−> < p r o p e r t y name= " c a c h e . p r o v i d e r _ c l a s s " > org . h i b e r n a t e . cache . NoCacheProvider </ property> < !−− Echo a l l e x e c u t e d SQL t o s t d o u t −−> < p r o p e r t y name= " s h o w _ s q l " > true </ property> < / s e s s i o n −f a c t o r y > < / h i b e r n a t e −c o n f i g u r a t i o n > To use Hibernate with other DBMs, developers should modify the following properties (described in the Hibernate documentation at http://www.hibernate.org/ hib_docs/v3/reference/en/html/session-configuration.html# configuration-hibernatejdbc): • connection.driver_class: jdbc driver class of your DBMs (must be included in the classpath). • connection.url: jdbc connection url. • username and password. • dialect: choose one from the table at: http://www.hibernate.org/hib_ docs/v3/reference/en/html/session-configuration.html# configuration-optional-dialects These are the small changes required to use the Hibernate connector with other DBM. Now, we must define the mapping files. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 11.1 Configuring the connector 50 11.1.2 Creating the mapping files The mapping files define how to map a Java Bean with a table of the data base. Firstly, you define which table is used to store the bean. Then you configure which column of the table contains each attribute of the bean. Our cases were composed by a description represented in the TravelDescription class, and a solution represented in the TravelSolution class. We are going to map both components to the same table. Some columns will store the attributes of the description and other will store the attributes of the solution. Figure 18 illustrates this mapping (the caseId column is in blue because it is used by both components). Figure 18: Travel Recommender components mapping Let’s begin with the mapping file of the solution. You must create the file TravelSolution.hbm.xml into jcolibri.example.travelrecommender with the following content: Listing 21: TravelSolution.hbm.xml <? xml v e r s i o n = " 1 . 0 " ? > < !DOCTYPE h i b e r n a t e −mapping PUBLIC " −// H i b e r n a t e / H i b e r n a t e Mapping DTD / / EN" " h t t p : / / h i b e r n a t e . s o u r c e f o r g e . n e t / h i b e r n a t e −mapping − 3 . 0 . d t d " > < h i b e r n a t e −mapping d e f a u l t −l a z y = " f a l s e " > <class name= " j c o l i b r i . e x a m p l e s . TravelRecommender . T r a v e l S o l u t i o n " t a b l e =" t r a v e l "> < i d name= " i d " column = " c a s e I d " > </ id> < p r o p e r t y name= " p r i c e " column = " P r i c e " / > < p r o p e r t y name= " h o t e l " column = " H o t e l " / > </ class> < / h i b e r n a t e −mapping > Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 11.1 Configuring the connector 51 The class tag sets the jcolibri.examples.TravelRecommender. TravelSolution class to be stored into the travel table. The id attribute of TravelSolution is mapped to the caseId column. It is done through the id tag that defines the attribute as the primary key of the table. Finally, the remaining attributes are mapped to the other columns of the table using the property tag. Now, we have to create the mapping file of the description. file name is TravelDescription.hbm.xml and must be located jcolibri.example.travelrecommender: The into Listing 22: TravelDescription.hbm.xml <? xml v e r s i o n = " 1 . 0 " ? > < !DOCTYPE h i b e r n a t e −mapping PUBLIC " −// H i b e r n a t e / H i b e r n a t e Mapping DTD / / EN" " h t t p : / / h i b e r n a t e . s o u r c e f o r g e . n e t / h i b e r n a t e −mapping − 3 . 0 . d t d " > < h i b e r n a t e −mapping d e f a u l t −l a z y = " f a l s e " > <class name= " j c o l i b r i . e x a m p l e s . TravelRecommender . T r a v e l D e s c r i p t i o n " t a b l e =" Travel "> < i d name= " c a s e I d " column = " c a s e I d " > < g e n e r a t o r c l a s s =" n a t i v e " / > </ id> < p r o p e r t y name= " H o l i d a y T y p e " column = " H o l i d a y T y p e " / > < p r o p e r t y name= " NumberOfPersons " column = " NumberOfPersons " / > < p r o p e r t y name= " R e g i o n " column = " R e g i o n " > < t y p e name= " j c o l i b r i . c o n n e c t o r . d a t a b a s e u t i l s . GenericUserType "> <param name= " c l a s s N a m e " > j c o l i b r i . d a t a t y p e s . I n s t a n c e < / param > </ type> </ property> < p r o p e r t y name= " T r a n s p o r t a t i o n " column = " T r a n s p o r t a t i o n " / > < p r o p e r t y name= " D u r a t i o n " column = " D u r a t i o n " / > < p r o p e r t y name= " S e a s o n " column = " S e a s o n " > < t y p e name= " j c o l i b r i . c o n n e c t o r . d a t a b a s e u t i l s . EnumUserType "> <param name= " enumClassName " > j c o l i b r i . e x a m p l e s . TravelRecommender . T r a v e l D e s c r i p t i o n $ Seasons < / param > </ type> </ property> < p r o p e r t y name= " Accommodation " column = " Accommodation " > < t y p e name= " j c o l i b r i . c o n n e c t o r . d a t a b a s e u t i l s . EnumUserType "> Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 11.1 Configuring the connector 52 <param name= " enumClassName " > j c o l i b r i . e x a m p l e s . TravelRecommender . T r a v e l D e s c r i p t i o n $ AccommodationTypes < / param > </ type> </ property> </ class> < / h i b e r n a t e −mapping > Here, we are mapping again the jcolibri.examples.TravelRecommender. TravelDescription class into the travel table using the class tag. Then we map the caseId attribute to the column with the same name caseId. This attribute is the primary key of the table because we are mapping it using the id tag. Inside this tag we found the generator tag that defines how to define unique ids for new Beans that will be inserted in the table. Defining this property as native, Hibernate delegates this task to the underlaying DBM. After this tag we continue with the property tags that map normal attributes to the columns of the database. However there are some of them that have the type child tag. Those attributes have a special mapping because are user-defined types like the region attribute or because they are enumerates like season or accommodation. To map a user-defined type with Hibernate we have to use the type tag with the jcolibri.connector.databaseutils.GenericUserType value. This class allows to map any user-defined type indicated by the param tag. As explained in Section 9.4, the user-defined types in jCOLIBRI 2 must implement the jcolibri.connector.TypeAdaptor interface. This way, the general template to map user-defined types is: Listing 23: Mapping template for user-defined types < p r o p e r t y name= " b e a n A t t r i b u t e " column = " t a b l e C o l u m n " > < t y p e name= " j c o l i b r i . c o n n e c t o r . d a t a b a s e u t i l s . G e n e r i c U s e r T y p e " > <param name= " c l a s s N a m e " > implementation_of_TypeAdaptor_that_defines_attribute_type < / param > </ type> </ property> The other special configuration is the mapping of enumerates. To do this, we have to use the jcolibri.connector.databaseutils.EnumUserType class in the type tag and then indicate the class of the enumerate into the param tag. The general template is: Listing 24: Mapping template for enumerates < p r o p e r t y name= " b e a n A t t r i b u t e " column = " t a b l e C o l u m n " > < t y p e name= " j c o l i b r i . c o n n e c t o r . d a t a b a s e u t i l s . EnumUserType " > <param name= " enumClassName " > Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 11.1 Configuring the connector 53 enumerate_class_that_defines_attribute_type < / param > </ type> </ property> The complete Hibernate mapping documentation can be found at: http://www. hibernate.org/hib_docs/v3/reference/en/html/mapping.html. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 12. Using Ontologies in CBR applications 54 12 Using Ontologies in CBR applications For the last few years our research group has been working in Knowledge Intensive CBR using ontologies [23, 16, 17, 19]. We claim that ontologies are useful for designing KI-CBR applications because they allow the knowledge engineer to use knowledge already acquired, conceptualized and implemented in a formal language, like DLs based languages, reducing considerably the knowledge acquisition bottleneck. Our approach proposes the use of ontologies to build models of general domain knowledge. Although in a CBR system the main source of knowledge is the set of previous experiences, our approach to CBR is towards integrated applications that combine case specific knowledge with models of general domain knowledge. The more knowledge is embedded into the system, the more effective is expected to be. Semantic CBR processes can take advantage of this domain knowledge and obtain more accurate results. We state that the formalization of ontologies is useful for the CBR community regarding different purposes, namely: 1. Persistence of cases and/or indexes using individuals or concepts that are embedded in the ontology itself. 2. As the vocabulary to define the case structure, either if the cases are embedded as individuals in the ontology itself, or if the cases are stored in a different persistence media as a data base. 3. As the terminology to define the query vocabulary. The user can express better his requirements if he can use a richer vocabulary to define the query. During the similarity computation the ontology allows to bridge the gap between the query terminology and the case base terminology. 4. Retrieval and similarity [22, 46, 38], adaptation [23] and learning [1]. 5. Knowledge reuse between different CBR systems. jCOLIBRI 2 allows to implement CBR systems with all these features or, at least, offers an ontology management architecture that is the basis of this kind of CBR applications. The ontology support of jCOLIBRI 2 is built arround the OntoBridge library (http: //gaia.fdi.ucm.es/grupo/projects/ontobridge/). This library was developed by the GAIA research group to easily manage ontologies and DLs reasoners. It is base on the JENA3 library and allows to reason with several DL reasoners like Pellet 4 . 3 4 http://jena.sourceforge.net/ http://pellet.owldl.com Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 12.1 Case Base persistence in ontologies 55 12.1 Case Base persistence in ontologies jCOLIBRI 2 allows to use ontologies as a persistence media for the cases. The connectors architecture of the framework eases the incorporation of this feature as the single requirement is the implementation of a new connector. This connector is jcolibri.connector.OntologyConnector. To learn how to use it, read the documentation of the Test 10 in the examples. This example shows how to map a case (similar to the Travel Recommender case of this tutorial) into an ontology. Figure 19 illustrates this process. Figure 19: Case mapping in a ontology To explain in detail how works the OntologyConnector lets use the example shown in Figure 20. A concept of the ontology will define the cases (VACATION_CASE) and the other related concepts define the attributes (CATEGORY, PRICE and HOLIDAY_TYPE). This way, the OntologyConnector obtains the instances of the VACATION_CASE concept (case1, and case2) and follows their relationships to obtain the values of the attributes (That will be instances of the concepts that define the attributes: CATEGORY, PRICE and HOLIDAY_TYPE). 12.2 Computing similarities using ontologies There are different approaches for computing similarity measures based on ontologies that are provided in jCOLIBRI 2. It initially offers four functions to compute the concept Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 12.3 Case and Query vocabulary 56 Figure 20: OntologyConnector behavior example based similarity that depends on the location of the cases in the ontology. Of course, other similarity functions can easily be included. These similarity functions are shown in Figure 21 whereas Figure 22 explains how they work with a small ontology taken as example. These similarity measures are implemented in the package: jcolibri. method.retrieve.NNretrieval.similarity.local.ontology. The use of these measures will be explained in Section 13. 12.3 Case and Query vocabulary Regarding the case vocabulary, there is a direct approach consisting on the use of a domain ontology in an object-oriented way: concepts are types, or classes, individuals are allowed values, or objects, and relations are the attributes describing the objects. There are also simple types like string or numbers that are considered in the traditional way. The case structure is defined using types from the ontology even if the cases are not stored as individuals in the ontology. For example, the Region concept of an ontology can be used as a type where every one of its instances are the type values: Madrid, Barcelona, Sevilla,... Regarding the query vocabulary we have two options to define the queries: • Using exactly the same vocabulary used in the cases, i.e, the same types used in the case structure definition. • Using the ontology as the query vocabulary, what allows richer queries. The user can express better his requirements if he can use a richer vocabulary to define the query. During the similarity computation the ontology allows to bridge the gap between the query terminology and the case base terminology. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 12.3 Case and Query vocabulary fdeep_basic(i1 , i2 ) = max(prof(LCS(i1 , i2 ))) max (prof(Ci )) Ci ∈CN 57 fdeep(i1 , i2 ) = max(prof(LCS(i1 , i2 ))) max(prof(i1 ), prof(i2 )) cosine(i1 , i2 ) = [ \ [ (super(d , CN )) (super(d , CN )) i i di ∈t(i1 ) di ∈t(i2 ) v sim(t(i1 ), t(i2 )) = v u u u [ u [ u u t (super(di , CN )) · t (super(di , CN )) di ∈t(i1 ) di ∈t(i2 ) detail(i1 , i2 ) = detail(t(i1 ), t(i2 )) = 1 − 1 [ \ [ 2 · (super(di , CN )) (super(di , CN )) di ∈t(i1 ) di ∈t(i2 ) Where: CN is the set of all the concepts in the current knowledge base super(c, C) is the subset of concepts in C which are superconcepts of c LCS(i1 , i2 ) is the set of the least common subsumer concepts of the two given individuals prof(c) is the depth of concept c t(i) is the set of concepts the individual i is instance of Figure 21: Concept based similarity functions in jCOLIBRI2 Example: In the travel domain, lets suppose we have an existing case base where it is defined an enumerated type for the Region attribute where the allowed values are countries: Spain, France, Italy and others. Suppose that casei is a case whose destination is Spain. We do not want to restrict the query vocabulary to the same type but allow broader queries, for example: • Query1: "I want to go to Madrid" • Query2: "My favorite destination is Europe" • Query3: "I would like to travel to Spain" In the three queries and using the ontology of Figure 22 we could find casei as a suitable candidate. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 12.4 Using an ontology in the Travel Recommender Sim(Mad,Bcn) Sim(Mad,Paris) Sim(NY,Seattle Sim(Seattle, Vanc) Sim(Bcn,Bogotá) Sim(Mad,Bogotá) Sim(Vanc,Bogotá) fdeep_basic 3/4 1/2 1 3/4 0 0 1/2 fdeep 1 1/2 1 3/4 0 0 1/2 58 cosine 1 2/3 1 3/4 √ 1/√12 1/ 12 1/2 detail 5/6 3/4 7/8 5/6 1/2 1/2 3/4 Figure 22: Example of application of the similarity functions 12.4 Using an ontology in the Travel Recommender Our Travel Recommender application uses an ontology of regions used to define the type or the Region attribute. It also defines the query vocabulary of that attribute. The ontology stores information about travel destinations, is located at http: //gaia.fdi.ucm.es/ontologies/travel-destinations.owl and was developed using the PROTEGE5 ontology editor. Figure 23 shows how this ontology is used to define the query in the Travel Recommender application. When the user clicks on the Region button, a dialog (generated by the OntoBridge library) shows the ontology allowing to choose the desired destination. To use an ontology in a CBR application, developers only have to initialize OntoBridge. Then, the ontology connector and similarity measures will use that library to operate with the ontology. In the Travel Recommender application, we only have to add the code shown in the following listing to the configure() method of the TravelRecommender class (this listing is the final state of that method): Listing 25: TravelRecommender configure() method (final version) p u b l i c v o i d c o n f i g u r e ( ) throws E x e c u t i o n E x c e p t i o n { try { / / Emulate data base s e r v e r j c o l i b r i . t e s t . d a t a b a s e . HSQLDBserver . i n i t ( ) ; / / Create a data base connector _ c o n n e c t o r = new D a t a B a s e C o n n e c t o r ( ) ; / / I n i t t h e ddbb c o n n e c t o r w i t h t h e c o n f i g f i l e _connector . initFromXMLfile ( j c o l i b r i . u t i l . FileIO 5 http://protege.stanford.edu/ Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 12.4 Using an ontology in the Travel Recommender 59 Figure 23: Defining the Travel Recommender query through an ontology . f i n d F i l e ( " j c o l i b r i / examples / TravelRecommender / d a t a b a s e c o n f i g . xml " ) ) ; / / C r e a t e a L i n e a l c a s e b a s e f o r i n −memory o r g a n i z a t i o n _ c a s e B a s e = new L i n e a l C a s e B a s e ( ) ; / / Obtain a r e f e r e n c e to OntoBridge O n t o B r i d g e ob = j c o l i b r i . u t i l . O n t o B r i d g e S i n g l e t o n . getOntoBridge ( ) ; / / C o n f i g u r e i t t o work w i t h t h e P e l l e t r e a s o n e r ob . i n i t W i t h P e l l e t R e a s o n e r ( ) ; / / S e t u p t h e main o n t o l o g y OntologyDocument mainOnto = new OntologyDocument ( " h t t p : / / g a i a . f d i . ucm . e s / o n t o l o g i e s / t r a v e l −d e s t i n a t i o n s . owl " , FileIO . findFile ( " j c o l i b r i / examples / TravelRecommender / t r a v e l −d e s t i n a t i o n s . owl " ) . toExternalForm () ) ; / / There are not s u b o n t o l o g i e s A r r a y L i s t < OntologyDocument > s u b O n t o l o g i e s = new A r r a y L i s t < OntologyDocument > ( ) ; / / Load t h e o n t o l o g y ob . l o a d O n t o l o g y ( mainOnto , s u b O n t o l o g i e s , f a l s e ) ; / / Create the dialogs s i m i l a r i t y D i a l o g = new S i m i l a r i t y D i a l o g ( main ) ; Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 12.4 Using an ontology in the Travel Recommender resultDialog autoAdaptDialog revisionDialog retainDialog = = = = new new new new 60 R e s u l t D i a l o g ( main ) ; A u t o A d a p t a t i o n D i a l o g ( main ) ; R e v i s i o n D i a l o g ( main ) ; R e t a i n D i a l o g ( main ) ; } catch ( Exception e ) { throw new E x e c u t i o n E x c e p t i o n ( e ) ; } } The remaining steps to use the ontology have been already implemented. In the TravelDescription code (Listing 13) we defined the type of the Region attribute as Instance. This way, the jCOLIBRI 2 methods will connect to OntoBridge to manage that attribute. Moreover, we need to indicate the connector how to manage this attribute, but we did that when defining the mapping file of TravelDescription in TravelDescription.hbm.xml. Then, our connector will use OntoBridge to link the values in the data base with the instances of the loaded ontology. In this case we are not storing the whole case base into the ontology. The ontology only defines the type and values of the Region attribute. The data base connector will read the values in the table and look for an instance with the same name in the ontology (through OntoBridge). Once found, the connector will fill the value of the Region attribute of each case with the corresponding instance of the ontology. This is shown in Figure 24 and the code required to map the data base with the ontology appears in Listing 22. Figure 24: Mapping between data base and ontology Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 13. Retrieval 61 13 Retrieval At this point, we have completed the configure() and precycle() methods of our Travel Recommender application. Those methods allow us to load the cases from the persistence into memory. The precycle() method loads the cases and stores them into the _caseBase object of our main application (review Listing 18). This section and the following ones show how to fill the cycle() method to perform the 4R’s tasks (retrieval, reuse, revise, and retain). The retrieval step obtains the most similar cases given a query. The main method of jCOLIBRI 2 for computing the retrieval is the jcolibri.method.retrieve.NNretrieval.NNScoringMethod class. This method performs a Nearest Neighbor numeric scoring comparing attributes. It uses global similarity functions to compare compound attributes (CaseComponents) and local similarity functions to compare simple attributes. For example, the TravelDescription case component of our cases is a compound attribute composed by several simple attributes (season, duration, etc.). So, we assign a global similarity function to the description like the average function. Then, simple similarity functions like equal, numeric interval, enumerate distance, ... are configured for each simple attribute. The NNScoringMethod will compute the similarity of each simple attribute and then compute the global similarity: the average of the simple similarities. The configuration of those similarity functions is stored in jcolibri.method.retrieve.NNretrieval.NNConfig object. values configured in this object are: the The • Global similarity function for the description. • Global similarity functions for each compound attribute (CaseComponents excepting the description). • Local similarity functions for each simple attribute. • Weight for each attribute. Following listing shows the signature of the NNScoringMethod: Listing 26: NNScoringMethod signature /∗ ∗ ∗ P e r f o r m s t h e NN s c o r i n g o v e r a c o l l e c t i o n o f c a s e s c o m p a r i n g them w i t h a q u e r y . ∗ T h i s method i s c o n f i g u r e d t h r o u g h t h e NNConfig o b j e c t . ∗/ public s t a t i c Collection < RetrievalResult > e v a l u a t e S i m i l a r i t y ( C o l l e c t i o n <CBRCase> c a s e s , CBRQuery q u e r y , NNConfig simConfig ) Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 13.1 Similarity functions 62 The parameters of this method are very intuitive: the cases to compare with the query, the query and the similarity configuration. The method returns a collection of jcolibri.method.retrieve.RetrievalResult objects. These objects contain the retrieved case and a double that represents the similarity of that case to the query. 13.1 Similarity functions There are many similarity functions in the jcolibri.method.retrieve. NNretrieval.similarity package. Each local similarity measure can only be applied to some datatypes. This way there are local similarities applicable to integers, strings, instances, ... Developers can define their own similarity measures implementing the interfaces: jcolibri.method.retrieve.NNretrieval.similarity. GlobalSimilarityFunction and jcolibri.method.retrieve. NNretrieval.similarity.LocalSimilarityFunction. The signatures of both interfaces are: Listing 27: Local and Global similarity interfaces public interface GlobalSimilarityFunction { /∗ ∗ ∗ Computes t h e g l o b a l s i m l i a r i t y b e t w e e n two compound attributes . ∗ I t r e q u i r e s t h e NNConfig o b j e c t t h a t s t o r e s t h e configuration of i t s contained a t t r i b u t e s . ∗ @param c o m p o n e n t O f C a s e compound a t t r i b u t e o f t h e c a s e ∗ @param c o m p o n e n t O f Q u e r y compound a t t r i b u t e o f t h e q u e r y ∗ @param _ c a s e c a s e b e i n g compared ∗ @param _ q u e r y q u e r y b e i n g compared ∗ @param numSimConfig S i m i l a r i t y f u n c t i o n s c o n f i g u r a t i o n ∗ @return a v a l u e b e t w e e n [ 0 . . 1 ] ∗/ p u b l i c d o u b l e compute ( CaseComponent componentOfCase , CaseComponent componentOfQuery , CBRCase _ c a s e , CBRQuery _ q u e r y , NNConfig numSimConfig ) ; } public interface LocalSimilarityFunction { /∗ ∗ ∗ Computes t h e s i m i l a r i t y b e t w e e n two o b j e c t s ∗ @param c a s e O b j e c t o b j e c t o f t h e c a s e ∗ @param q u e r y O b j e c t o b j e c t o f t h e q u e r y ∗ @return a v a l u e b e t w e e n [ 0 . . 1 ] ∗/ Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 13.2 Cases selection 63 p u b l i c d o u b l e compute ( O b j e c t c a s e O b j e c t , O b j e c t q u e r y O b j e c t ) throws j c o l i b r i . e x c e p t i o n . NoApplicableSimilarityFunctionException ; /∗ ∗ ∗ I n d i c a t e s i f t h e f u n c t i o n i s a p p l i c a b l e t o two o b j e c t s ∗/ public boolean i s A p p l i c a b l e ( Object caseObject , Object queryObject ) ; } As the previous listing shows, the local similarity functions cannot access to the other attributes of the case to compute their measures. This can be a problem in some applications where the similarity of two attributes depends on the value of other attributes in the same case component. To solve this drawback, jCOLIBRI 2 includes the jcolibri.method.retrieve.NNretrieval.similarity. InContextLocalSimilarityFunction abstract class that incorporates information about the context (CaseComponent) of the attribute. Extended information about this class can be found in its documentation. 13.2 Cases selection Most similar cases must be selected once they have been scored according to their similarity with the query. Usually, only the top k most similar cases are selected. This retrieval process that combines Nearest Neighbor scoring and top k selection is commonly called k-NN retrieval. The jcolibri.method.retrieve.selection.SelectCases class includes basic methods to select cases: Listing 28: Cases selection methods /∗ ∗ ∗ Selects a l l cases ∗ @param c a s e s t o s e l e c t ∗ @return a l l c a s e s ∗/ p u b l i c s t a t i c C o l l e c t i o n <CBRCase> s e l e c t A l l ( C o l l e c t i o n < RetrievalResult > cases ) /∗ ∗ ∗ S e l e c t s top K cases ∗ @param c a s e s t o s e l e c t ∗ @param k i s t h e number o f c a s e s t o s e l e c t ∗ @return t o p k c a s e s ∗/ p u b l i c s t a t i c C o l l e c t i o n <CBRCase> s e l e c t T o p K ( C o l l e c t i o n < RetrievalResult > cases , int k ) Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 13.3 Using the k-NN retrieval 64 /∗ ∗ ∗ S e l e c t s a l l c a s e s b u t r e t u r n s them i n t o R e t r i e v a l R e s u l t objects ∗ @param c a s e s t o s e l e c t ∗ @return a l l c a s e s i n t o R e t r i e v a l R e s u l t o b j e c t s ∗/ public s t a t i c Collection < RetrievalResult > selectAllRR ( Collection <RetrievalResult > cases ) /∗ ∗ ∗ S e l e c t s t o p k c a s e s b u t r e t u r n s them i n t o RetrievalResult objects ∗ @param c a s e s t o s e l e c t ∗ @return t o p k c a s e s i n t o R e t r i e v a l R e s u l t o b j e c t s ∗/ p u b l i c s t a t i c C o l l e c t i o n < R e t r i e v a l R e s u l t > selectTopKRR ( Collection < RetrievalResult > cases , int k ) Previous listing shows that there are two versions of the methods: one returning CBRCase objects and other returning RetrievalResult object. There are another more sophisticated ways to select cases. Sometimes it is important to select cases that are similar to the query but also diverse among them. jCOLIBRI2 includes some of these methods in the jcolibri.method.retrieve.selection. As these methods were included in the recommenders extension of the 2.1 version, they are detailed in Section 20.2.9. Note that there are important changes in the k-NN retrieval implementation from version 2.0 to version 2.1. These changes are detailed in Section 24. 13.3 Using the k-NN retrieval There are many examples in the documentation of jCOLIBRI 2 that show how to configure the NNScoringMethod. In the Travel Recommender example this code is included in the jcolibri.examples.TravelRecommender.gui.SimilarityDialog. getSimilarityConfig() method. As this code obtains the similarity values from the graphical widgets, here we will show how to configure and execute the NN method in a direct way. The following code is extracted from the first Test of the documentation: Listing 29: Example code for the k-NN Retrieval p u b l i c v o i d c y c l e ( CBRQuery q u e r y ) throws E x e c u t i o n E x c e p t i o n { / / F i r s t c o n f i g u r e t h e NN s c o r i n g Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 13.3 Using the k-NN retrieval 65 NNConfig s i m C o n f i g = new NNConfig ( ) ; / / Set the average ( ) global s i m i l a r i t y f u n c t i o n for the d e s c r i p t i o n of the case s i m C o n f i g . s e t D e s c r i p t i o n S i m F u n c t i o n ( new A v e r a g e ( ) ) ; / / The a c c o m o d a t i o n a t t r i b u t e u s e s t h e e q u a l ( ) l o c a l similarity function s i m C o n f i g . addMapping ( new A t t r i b u t e ( " A c c o m o d a t i o n " , T r a v e l D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; / / For t h e d u r a t i o n a t t r i b u t e we a r e g o i n g t o s e t i t s l o c a l s i m i l a r i t y f u n c t i o n and t h e w e i g h t A t t r i b u t e d u r a t i o n = new A t t r i b u t e ( " D u r a t i o n " , TravelDescription . class ) ; s i m C o n f i g . addMapping ( d u r a t i o n , new I n t e r v a l ( 3 1 ) ) ; simConfig . setWeight ( duration , 0 . 5 ) ; / / H o l i d a y T y p e −−> e q u a l ( ) s i m C o n f i g . addMapping ( new A t t r i b u t e ( " H o l i d a y T y p e " , T r a v e l D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; / / NumberOfPersons −−> e q u a l ( ) s i m C o n f i g . addMapping ( new A t t r i b u t e ( " NumberOfPersons " , T r a v e l D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; / / P r i c e −−> I n t e r v a l ( ) s i m C o n f i g . addMapping ( new A t t r i b u t e ( " P r i c e " , T r a v e l D e s c r i p t i o n . c l a s s ) , new I n t e r v a l ( 4 0 0 0 ) ) ; / / E x e c u t e NN C o l l e c t i o n < R e t r i e v a l R e s u l t > e v a l = NNScoringMethod . e v a l u a t e S i m i l a r i t y ( _caseBase . g e t C a s e s ( ) , query , simConfig ) ; / / Select k cases e v a l = S e l e c t C a s e s . selectTopKRR ( e v a l , 5 ) ; / / Print the r e t r i e v a l System . o u t . p r i n t l n ( " R e t r i e v e d c a s e s : " ) ; for ( R e t r i e v a l R e s u l t nse : eval ) System . o u t . p r i n t l n ( n s e ) ; / / F i n a l l y remove t h e r e t r i e v a l i n f o ( t h e s i m i l a r i t y v a l u e ) and g e t o n l y t h e c a s e s C o l l e c t i o n < j c o l i b r i . c b r c o r e . CBRCase> r e t r i e v e d C a s e s = RemoveRetrievalEvaluation . removeRetrievalEvaluation ( eval ) ; } The code is very simple: the NNConfig class has methods to set the similarity function and weight for the attributes. The attributes of a bean are represented using the jcolibri.cbrcore.Attribute class as explained in Section 9.1. Once configured, NNScoringMethod.evaluateSimilarity() is executed obtaining a list of RetrievalResult objects that contain the most similar cases to the query. Fi- Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 13.4 Retrieval in the Travel Recommender application 66 nally, we obtain the 5 most similar cases using SelectCases.selectTopKRR(). This method returns the top k cases as RetrievalResult objects (keeping the similarity value) although there is another method that removes that information and only returns CBRCase objects (SelectCases.selectTopK()). 13.4 Retrieval in the Travel Recommender application The retrieval code in the Travel Recommender application is very simple because the SimilarityDialog window gives us the NNConfig object and the k value. So, we only have to compute the k-NN, show the result using the ResultDialog window and finally, remove the retrieval evaluation to obtain the list of cases. Add this code into the cycle() method of the TravelRecommender class: Listing 30: TravelRecommender.cycle() code (step1) p u b l i c v o i d c y c l e ( CBRQuery q u e r y ) throws E x e c u t i o n E x c e p t i o n { / / O b t a i n c o n f i g u r a t i o n f o r k−NN s i m i l a r i t y D i a l o g . s e t V i s i b l e ( true ) ; NNConfig s i m C o n f i g = s i m i l a r i t y D i a l o g . g e t S i m i l a r i t y C o n f i g ( ) ; s i m C o n f i g . s e t D e s c r i p t i o n S i m F u n c t i o n ( new A v e r a g e ( ) ) ; / / E x e c u t e NN C o l l e c t i o n < R e t r i e v a l R e s u l t > e v a l = NNScoringMethod . e v a l u a t e S i m i l a r i t y ( _caseBase . g e t C a s e s ( ) , query , simConfig ) ; / / Select k cases C o l l e c t i o n <CBRCase> s e l e c t e d c a s e s = S e l e c t C a s e s . s e l e c t T o p K ( e v a l , s i m i l a r i t y D i a l o g . getK ( ) ) ; / / Show r e s u l t r e s u l t D i a l o g . showCases ( e v a l , s e l e c t e d c a s e s ) ; r e s u l t D i a l o g . s e t V i s i b l e ( true ) ; ... } With this code we have implemented the functionality shown in Figures 7 and 8. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 14. Reuse 67 14 Reuse The reuse step (also named adaptation step) adapts the solution of the retrieved cases to the requirements of the query. This step is very domain dependent and use to vary depending on the application. jCOLIBRI 2 leaves this step open to developers. They should create their own adaptation methods customized for the application. Anyway, jCOLIBRI 2 offers two basic adaptation methods: • jcolibri.method.reuse.DirectAttributeCopyMethod. This method copies the value of an attribute in the query to an attribute of a case. • jcolibri.method.reuse.NumericDirectProportionMethod. Performs a numerical direct proportion among attributes of the query and the case. Besides these methods, the jcolibri.method.reuse.classification package includes some classification reuse methods implemented by Lisa Cummins & Derek Bridge. (University College Cork, Ireland). Ontologies can be used to guide the adaptation of the cases. There are some experimental methods (not included in the framework) that are explained in [43]. Once the retrieved cases are adapted, their description can be substituted by the description of the query. This way, we obtain a list of cases with the same description than the query. This step is performed by the jcolibri.method.reuse.CombineQueryAndCasesMethod method, and is optional depending on the application. After adapting the cases, the CBR system proposes them as a suggested solution of the problem (review Figure 1). Then, the system user (or a domain expert) could revise these solutions in the following step. 14.1 Adapting the Travel Recommender retrieved trips In this tutorial we are using the NumericDirectProportionMethod method to adapt the solution of the trips according with the description of the query. We are going to use this method to compute the following proportions: • NumberOfPersons and Price. Imagine that the number of persons configured in the query is 4 and the value of this attribute in the case is 2. In the solution of the case, the price is 1000, so we have to compute a direct proportion among these attributes to adapt the solution to the requirements of the query. After executing the method, the new value of the price in the case solution will be 2000. • Duration and Price. This adaptation is analogous to the previous one, but using the Duration attribute instead of NumberOfPersons. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 14.1 Adapting the Travel Recommender retrieved trips 68 To perform these adaptations in the Travel Recommender application append the following code to the cycle() method: Listing 31: TravelRecommender.cycle() code (step2) ... / / Show a d a p t a t i o n d i a l o g autoAdaptDialog . s e t V i s i b l e ( true ) ; / / A d a p t d e p e n d i n g on u s e r s e l e c t i o n i f ( autoAdaptDialog . adapt_Duration_Price ( ) ) { / / Compute a d i r e c t p r o p o r t i o n b e t w e e n t h e " D u r a t i o n " and " Price " a t t r i b u t e s . N u m e r i c D i r e c t P r o p o r t i o n M e t h o d . d i r e c t P r o p o r t i o n ( new A t t r i b u t e ( " D u r a t i o n " , T r a v e l D e s c r i p t i o n . c l a s s ) , new A t t r i b u t e ( " p r i c e " , T r a v e l S o l u t i o n . c l a s s ) , query , selectedcases ) ; } i f ( autoAdaptDialog . adapt_NumberOfPersons_Price ( ) ) { / / Compute a d i r e c t p r o p o r t i o n b e t w e e n t h e " D u r a t i o n " and " Price " a t t r i b u t e s . N u m e r i c D i r e c t P r o p o r t i o n M e t h o d . d i r e c t P r o p o r t i o n ( new A t t r i b u t e ( " NumberOfPersons " , T r a v e l D e s c r i p t i o n . c l a s s ) , new A t t r i b u t e ( " p r i c e " , T r a v e l S o l u t i o n . c l a s s ) , q u e r y , selectedcases ) ; } ... This code shows the dialog in Figure 9 and performs the required adaptations. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 15. Revise 69 15 Revise In the revise step the proposed solution is tested for success, e.g. by being applied to the real world environment or evaluated by a domain expert, and repaired if failed. This step is also very domain dependent and may change among applications. jCOLIBRI 2 only includes a method to define new Ids to the cases as they will be stored into the data base during the following step and they cannot use the original Id of the retrieved cases. This method is jcolibri.method.revise.DefineNewIdsMethod. There are also some classification revise methods implemented by Lisa Cummins & Derek Bridge. (University College Cork, Ireland) in the jcolibri.method.revise.classification package. 15.1 Travel Recommender revision The revise step in the Travel Recommender is extremely simple because the jcolibri.examples.TravelRecommender.gui.RevisionDialog class is in charge of presenting the cases for edition and modify them according with the user selections. The code is: Listing 32: TravelRecommender.cycle() code (step3) ... / / Revise r e v i s i o n D i a l o g . showCases ( s e l e c t e d c a s e s ) ; r e v i s i o n D i a l o g . s e t V i s i b l e ( true ) ; ... These methods will show the window illustrated in Figure 10. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 16. Retain 70 16 Retain In the retain step useful new cases are stored in the case base for future reuse. This way the CBR system has learned a new experience. jCOLIBRI 2 includes the jcolibri.method.retain.StoreCasesMethod to include new cases into the case base. The new added cases will be stored in the persistence media depending on the chosen implementation of CBRCaseBase. The LinealCaseBase class will store the cases directly into the persistence layer, but the CachedLinealCaseBase will keep the new cases into memory and save them only when closing the CBR application (this is, when the postCycle() method is invoked). 16.1 Saving the new trips of the Travel Recommender The following code shows the retain window in the application and calls the StoreCasesMethod method. The retain window allows the user to select which cases must be added to the case base and to define a new Id for each one. Here, this retain dialog is in charge of defining the Ids instead of using the jcolibri.method.revise.DefineNewIdsMethod method. Listing 33: TravelRecommender.cycle() code (step4) / / Retain r e t a i n D i a l o g . showCases ( s e l e c t e d c a s e s , _ c a s e B a s e . g e t C a s e s ( ) . size () ) ; r e t a i n D i a l o g . s e t V i s i b l e ( true ) ; C o l l e c t i o n <CBRCase> c a s e s T o R e t a i n = r e t a i n D i a l o g . getCasestoRetain () ; _caseBase . learnCases ( casesToRetain ) ; This code code shows the the window illustrated in Figure 11. With this retain step we have completed the code of the cycle() method of the Travel Recommender application. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 17. Shutting down a CBR application 71 17 Shutting down a CBR application The postcycle() method of the CBR applications in jCOLIBRI 2 is in charge of shutting down the CBR system or invoke maintenance tasks. This method usually calls the close() method of the connector to store (if required) the learned cases into the persistence media. In the Travel Recommender application we call the close() method of the connector and also the shutdown() method of the data base manager. Listing 34: TravelRecommender.postCycle() code p u b l i c v o i d p o s t C y c l e ( ) throws E x e c u t i o n E x c e p t i o n { _connector . close () ; j c o l i b r i . t e s t . d a t a b a s e . HSQLDBserver . shutDown ( ) ; } This method completes the tutorial. Now the Travel Recommender code should compile and work properly. If you find some problems read the complete source code of the located into the “example” subfolder of the framework and compressed as travelrecommender-source.zip. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18. Textual CBR 72 18 Textual CBR Textual Case-Based Reasoning (TCBR) is a subfield of CBR the cases are available in textual format. This type of CBR systems is very interesting because in most of the domains where CBR can be applied the available experiences are in textual format. Some examples are: laws, medicine, help-desk, etc. There does not appear to be a standard or consensus about the structure of a textual CBR system. This is mainly due to the different knowledge requirements in application domains. For classification applications typically only a basic stemmer algorithm and a cosine similarity function is needed, while with other applications more intense NLP derived structures are employed (see [9] and [10]). Although a common functionality for TCBR systems is difficult to establish, several researchers have attempted to define the different knowledge requirements for TCBR([24],[55]). To support the development of TCBR systems, jCOLIBRI 2 includes a complete extension with useful methods for this kind of applications. As adaptation continues being a domain specific task, the framework only includes methods for the retrieval step. There are two big groups of retrieval algorithms that can be used in TCBR: • The first group is based on IE methods to capture features from the text and then perform standard Nearest Neighbor similarity based computation over those features. With these methods developers extract the information contained in the texts and represent them in a structured way that allows to apply the typical retrieval and reuse methods in common CBR applications. • The second one is composed by the broadly applied IR algorithms used by search engines and based in the Vector Space Model. These methods can achieve very good results in the retrieval step, but make difficult the adaptation because there is not a structure representation of the cases. We could refer to the first group as “semantic” retrieval because it tries to capture the semantics of the texts and the second as “statistical” retrieval because it only takes into account the frequency of terms. 18.1 Semantic retrieval The first set of textual methods included in jCOLIBRI 2 belong to the semantic retrieval group, and are based in the Lenz layered model [33]. This model divides the text processing into several stages: Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.1 Semantic retrieval 73 1. Keyword Layer. This layer separates texts into terms, removes stop-words, stem terms and calculates statistics about frequency of terms. It also proposes a partof-speech tagger in this layer that could be useful by the following ones. This layer is domain-independent, so it can be shared between applications. 2. Phrase Layer. Recognises domain-specific phrases using a dictionary. Here, the problems are that some parts of the phrase can be separated and that the dictionary must be built manually. 3. Thesaurus Layer. This layer identifies synonyms and related terms. Methods implemented in this layer must be reusable in the query stage of the CBR cycle. WordNet can be used as an english thesaurus. This phase is domain-independent. 4. Glossary Layer. Is the domain-specific version of the thesaurus layer. So it is desirable to define a common interface for both layers. The main difficulty with this layer resides in the glossary acquisition. 5. Feature Value Layer. With semi-structured cases, this layer extracts features about the case and stores it as <attribute,value> pairs in the case representation. It is also domain-specific. 6. Domain Structure Layer. Uses the previous layer to classify documents in a high level. It assigns "topic" features to the cases that can be useful in the indexing process. 7. Information Extraction Layer. Some parts of the texts can be better represented with a structured approximation. This layer accomplish this task. (note that this functionality can overlap with the two previous layers). The last IE layer applies user defined rules to extract the features using the information obtained in previous layers. This way, developers obtain a structured representation of cases that can be managed by standard similarity matching techniques from CBR. 18.1.1 Representation of the texts jCOLIBRI 2has the generic jcolibri.datatypes.Text object to store texts into cases. These objects are managed by the methods of the jcolibri.extensions.textual package. This object is not enough to manage the required information defined by the Lenz steps. So, there is a subclass named jcolibri.extensions.textual.IE.representation.IEText (texts for Information Extraction) that allows it. An IE text receives its content as String and later a method will organize this content. This way, a text is composed by paragraphs, paragraphs by sentences and sentences by tokens as shown in Figure 25 Tokens represent a word in the text. These objects store information like: Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.1 Semantic retrieval 74 Figure 25: Representation of texts for IE. • If the token is a stop word (word without sense). • If the token is a main name inside the sentence. • The stemed word. • The Part-Of-Speech tag of the token. • A list of relations with other similar tokens. The organization in paragraphs, sentences and tokens is performed by specific methods depending on the chosen implementation. The information extracted from the text is stored in the IEtext object. There are several types of information that will be obtained by dedicated methods: • Phrases identified in the text. • Features: identifier-value pairs extracted from the text. • Topics: combining phrases and features a topic can be associated to a text. A topic is a classification of the text. Phrases and Features are stored using the objects implemented in the jcolibri.extensions.textual.IE.representation.info subpackage. That package stores three objects that aid in the representation of the extracted information: • PhraseInfo: stores extracted phrases. • FeatureInfo: stores extracted features. • WeightedRelation: represents a weighted relation between two tokens. These relations are found by the glossary and thesaurus methods. Figure 26 illustrates the complete organization: Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.1 Semantic retrieval 75 Figure 26: Global view of the representation of texts for IE. 18.1.2 IE methods implementation jCOLIBRI includes several implementations of the Lenz layers. Some methods have been implemented in a general way. Other methods use the Maximum Entropy algorithms implemented in the OpenNLP package. And finally, another group of methods use the GATE library for text processing. The OpenNLP6 library includes methods that use Maximum Entropy algorithms that have been implemented and applied to Natural Language tasks. OpenNLP is divided into independent layers that can be used separately, providing a stop-word remover, sentence detector, part-of-speech tagger, grammar layer, ... GATE7 is an infrastructure for developing software components that process human language. This library is broadly used for IE applications and includes many components that are used in jCOLIBRI 2 to implement the Lenz layers: POS tagger, stemmer, features and phrases extractors, etc. Each group of methods can only work with certain textual objects. The OpenNLP implementation has its own specialization of IEText named IETextOpenNLP, and the GATE implementation has the IETextGate object: Following table summarizes the jCOLIBRI 2 methods for TCBR using Information Extraction: 6 7 http://opennlp.sourceforge.net http://gate.ac.uk Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.1 Semantic retrieval 76 Implementation OpenNLP GATE Generic Textual Object IETextOpenNLP IETextGate IETextOpenNLP, IETextGate, IEText Package IE.opennlp IE.gate IE.common Layers Organize Text OpennlpSplitter GateSplitter Keyword: StopWords StopWordsDetector Keyword: Stemmer TextStemmer Keyword: POS tagging OpennlpPOStagger GatePOStagger Keyword: Main Names OpennlpMainNamesExtractor Phrase GatePhrasesExtractor PhrasesExtractor Glossary GlossaryLinker Thesaurus ThesaurusLinker Feature Value GateFeaturesExtractor FeaturesExtractor Domain Structure DomainTopicClassifier Information Extraction BasicInformationExtractor 18.1.3 Computing similarity The methods of the IE extension extract information from texts and store it into the other attributes of the case. These attributes can be compared using normal similarity functions as shown in Figure 27 Figure 27: Common organization of textual cases Textual attributes can also be compared using specific similarity functions located in the package jcolibri.method. retrieve.NNretrieval.similarity.local.textual. Most of them are applicable to any Text attribute, but some few are only applicable to IEText objects (or its subclasses) because require information stored in the tokens. Cosine (similarity.local.textual.CosineCoefficient) |(t1 ∩ t2 )| cosine(t1 , t2 ) = p |t1 |, |t2 | Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.1 Semantic retrieval 77 Dice (similarity.local.textual.DiceCoefficient) dice(t1 , t2 ) = 2 ∗ |(t1 ∩ t2 )| (|t1 | + |t2 |) Jaccard (similarity.local.textual.JaccardCoefficient) jaccard(t1 , t2 ) = |(t1 ∩ t2 )| (|t1 | ∪ |t2 |) Overlap (similarity.local.textual.OverlapCoefficient) overlap(t1 , t2 ) = |(t1 ∩ t2 )| min (|t1 |, |t2 |) Compression (similarity.local.textual.compressionbased. CompressionBased) CDM (x, y) = C(xy) C(x) + C(y) where C(x) is the size of string x after compression (and C(y) similarly) and C(xy) is the size, after compression, of the string that comprises y concatenated to the end of x. Developed and implemented by Derek Bridge. See following papers [30, 14] and framework documentation. Normalised Compression (similarity.local.textual. compressionbased.NormalisedCompression) N CD(x, y) = C(xy) − min (C(X), C(y)) max (C(x), C(y)) where C(x) is the size of string x after compression (and C(y) similarly) and C(xy) is the size, after compression, of the string that comprises y concatenated to the end of x. Developed and implemented by Derek Bridge. See following papers [34, 15] and framework documentation. 18.1.4 The Restaurant Recommender example To illustrate the textual methods of jCOLIBRI 2, we have developed a restaurant adviser system. The entire case base contains roughly 100 different restaurants extracted from an online magazine. This recommender is implemented using both the semantic and statistical methods. The complete implementation of the Restaurant Recommender application can be found in the Tests 13a and 13b of the framework examples. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.1 Semantic retrieval 78 Following listings contains the most important code of the Restaurant Recommender implementation that uses the semantic textual methods of jCOLIBRI 2. In this case we are using the OpenNLP version of the methods. In the precycle, the recommender application executes the textual methods over the complete case base. It illustrates the advantages of having a precycle in the CBR system because this computation is an very takes a long time but is only executed once. Listing 35: The Restaurant Recommender precycle using semantic TCBR methods p u b l i c CBRCaseBase p r e C y c l e ( ) throws E x e c u t i o n E x c e p t i o n { / / I n t h e p r e c y c l e we pre−c o m p u t e t h e i n f o r m a t i o n e x t r a c t i o n in the case base / / I n i t i a l i z e Wordnet T h e s a u r u s L i n k e r . loadWordNet ( ) ; / / Load u s e r − s p e c i f i c g l o s s a r y GlossaryLinker . loadGlossary ( " j c o l i b r i / t e s t txt ") ; / / Load p h r a s e s r u l e s PhrasesExtractor . loadRules ( " j c o l i b r i / t e s t / phrasesRules . txt " ) ; / / Load f e a t u r e s r u l e s FeaturesExtractor . loadRules ( " j c o l i b r i / t e s t featuresRules . txt " ) ; / / Load t o p i c r u l e s DomainTopicClassifier . loadRules ( " j c o l i b r i / domainRules . t x t " ) ; / test13 / glossary . test13 / / test13 / test / test13 / / / Obtain cases _caseBase . i n i t ( _connector ) ; C o l l e c t i o n <CBRCase> c a s e s = _ c a s e B a s e . g e t C a s e s ( ) ; / / P e r f o r m IE m e t h o d s i n t h e c a s e s / / O r g a n i z e c a s e s i n t o p a r a g r a p h s , s e n t e n c e s and t o k e n s OpennlpSplitter . s p l i t ( cases ) ; / / Detect stopwords StopWordsDetector . detectStopWords ( cases ) ; / / Stem t e x t TextStemmer . s t e m ( c a s e s ) ; / / P e r f o r m POS t a g g i n g OpennlpPOStagger . t a g ( c a s e s ) ; / / E x t r a c t main names OpennlpMainNamesExtractor . extractMainNames ( c a s e s ) ; / / Extract phrases PhrasesExtractor . extractPhrases ( cases ) ; / / Extract features FeaturesExtractor . extractFeatures ( cases ) ; Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.1 Semantic retrieval 79 / / Class ify with a topic DomainTopicClassifier . classifyWithTopic ( cases ) ; / / P e r f o r m IE c o p y i n g e x t r a c t e d f e a t u r e s o r p h r a s e s i n t o other a t t r i b u t e s of the case BasicInformationExtractor . extractInformation ( cases ) ; return _caseBase ; } In the cycle(), the application only has to execute the TCBR methods over the query. However, there are some methods that relate query and case terms that can only be applied in this cycle instead of the precycle. The semantic TCBR methods extract the information from the text into the case (or query). This way, we can compute a typical k-NN similarity measure to obtain the most suitable restaurant for the query. Following listing shows the code: Listing 36: The Restaurant Recommender cycle using semantic TCBR methods p u b l i c v o i d c y c l e ( CBRQuery q u e r y ) throws E x e c u t i o n E x c e p t i o n { C o l l e c t i o n <CBRCase> c a s e s = _ c a s e B a s e . g e t C a s e s ( ) ; / / P e r f o r m IE m e t h o d s i n t h e c a s e s / / O r g a n i z e t h e q u e r y i n t o p a r a g r a p h s , s e n t e n c e s and t o k e n s O p e n n l p S p l i t t e r . s p l i t ( query ) ; / / Detect stopwords StopWordsDetector . detectStopWords ( query ) ; / / Stem q u e r y TextStemmer . s t e m ( q u e r y ) ; / / P e r f o r m POS t a g g i n g i n t h e q u e r y OpennlpPOStagger . t a g ( query ) ; / / E x t r a c t main names OpennlpMainNamesExtractor . extractMainNames ( query ) ; / / Now t h a t we h a v e t h e q u e r y we r e l a t e c a s e s t o k e n s w i t h the query tokens / / U s i n g t h e u s e r −d e f i n e d g l o s s a r y GlossaryLinker . LinkWithGlossary ( cases , query ) ; / / Using wordnet ThesaurusLinker . linkWithWordNet ( cases , query ) ; / / Extract phrases P h r a s e s E x t r a c t o r . e x t r a c t P h r a s e s ( query ) ; / / Extract features F e a t u r e s E x t r a c t o r . e x t r a c t F e a t u r e s ( query ) ; / / Class ify with a topic DomainTopicClassifier . c lassi fyWi thTo pic ( query ) ; Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.1 Semantic retrieval 80 / / P e r f o r m IE c o p y i n g e x t r a c t e d f e a t u r e s o r p h r a s e s i n t o other a t t r i b u t e s of the query B a s i c I n f o r m a t i o n E x t r a c t o r . e x t r a c t I n f o r m a t i o n ( query ) ; / / Now we c o n f i g u r e t h e k−NN r e t r i e v a l w i t h some u s e r − d e f i n e d s i m i l a r i t y measures NNConfig n n C o n f i g = new NNConfig ( ) ; n n C o n f i g . s e t D e s c r i p t i o n S i m F u n c t i o n ( new A v e r a g e ( ) ) ; n n C o n f i g . addMapping ( new A t t r i b u t e ( " l o c a t i o n " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; / / To compare t e x t we u s e t h e O v e r l a p C o f f i c i e n t n n C o n f i g . addMapping ( new A t t r i b u t e ( " d e s c r i p t i o n " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new O v e r l a p C o e f f i c i e n t ( ) ) ; / / This function takes a s t r i n g with several numerical v a l u e s and c o m p u t e s t h e a v e r a g e n n C o n f i g . addMapping ( new A t t r i b u t e ( " p r i c e " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new AverageMultipleTextValues (1000) ) ; / / T h i s f u n c t i o n t a k e s a s t r i n g w i t h s e v e r a l words s e p a r a t e d by w h i t e s p a c e s , c o n v e r t s i t t o a s e t o f t o k e n s and / / computes the s i z e of the i n t e r s e c t i o n of the query s e t and t h e c a s e s e t n o r m a l i z e d w i t h t h e c a s e s e t n n C o n f i g . addMapping ( new A t t r i b u t e ( " f o o d T y p e " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new T o k e n s C o n t a i n e d ( ) ) ; n n C o n f i g . addMapping ( new A t t r i b u t e ( " f o o d " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new T o k e n s C o n t a i n e d ( ) ) ; n n C o n f i g . addMapping ( new A t t r i b u t e ( " a l c o h o l " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; n n C o n f i g . addMapping ( new A t t r i b u t e ( " t a k e o u t " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; n n C o n f i g . addMapping ( new A t t r i b u t e ( " d e l i v e r y " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; n n C o n f i g . addMapping ( new A t t r i b u t e ( " p a r k i n g " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; n n C o n f i g . addMapping ( new A t t r i b u t e ( " c a t e r i n g " , R e s t a u r a n t D e s c r i p t i o n . c l a s s ) , new E q u a l ( ) ) ; C o l l e c t i o n < R e t r i e v a l R e s u l t > r e s = NNScoringMethod . e v a l u a t e S i m i l a r i t y ( cases , query , nnConfig ) ; r e s = S e l e c t C a s e s . selectTopKRR ( r e s , 5 ) ; / / Show t h e r e s u l t ... } Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.2 Statistical retrieval 81 18.2 Statistical retrieval The second set of textual methods, added to jCOLIBRI 2, belong to the statistical approach that has given such good results in the IR field. jCOLIBRI 2 methods are based on the Apache Lucene search engine [25]. Lucene uses a combination of the Vector Space Model (VSM) of IR and the Boolean model to determine how relevant a given document is to a user’s query. In general, the idea behind the VSM is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. It uses the Boolean model to first narrow down the documents that need to be scored based on the use of boolean logic in the query specification. Lucene also adds some capabilities and refinements onto this model to support boolean and fuzzy searching, but it essentially remains a VSM based system at its heart. The main advantages of this search method is the good results and the applicability to non-structured texts. The big drawback is the lack of knowledge about the semantics of the texts that complicates the adaptation stage. The Lucene retrieval method of jCOLIBRI 2 is jcolibri.method.retrieve. LuceneRetrieval. This function can be applied to any Text subclass as it does not require any kind of extracted information. This method retrieves cases computing only the similarity of a given attribute. This way, the other attributes of the case are not used. To compute a complete k-NN retrieval using all the attributes of the case, there is a similarity function that uses Lucene to compare textual attributes: jcolibri.method.retrieve.NNretrieval.similarity.local. textual.LuceneTextSimilarity. These Lucene methods requiere an index to be created in the pre-cycle by the method jcolibri.method.precycle.LuceneIndexCreator. Although statistical IR methods give good retrieval results they do not provide any kind of explanation about the returned documents. One way for solving this problem is to cluster the retrieval results into groups of documents with common information. Usually, clustering algorithms like hierarchical clustering or K-means [57] group the documents but they don’t provide a comprehensive description of the resulting clusters. Lingo [39] is a clustering algorithm implemented in the Carrot28 framework that allows the grouping of search results but also gives the user a brief textual description of each cluster. Lingo is based on the Vector Space Model, Latent Semantic Indexing and Singular Value Decomposition to ensure that there are human readable descriptions of the clusters and then assign documents to each one. jCOLIBRI 2 provides wrapper methods to hide the complexity of the algorithm and allows a simple way for managing Carrot2. 8 http://www.carrot2.org Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 18.2 Statistical retrieval 82 Read the documentation of the jcolibri.extensions.textual.carrot2 package for details. This labeled-clustering algorithm could be applied to TCBR in the retrieval step to make it easier to choose the most similar document. Moreover, in [42] we present an alternative approach that uses the labels of the clusters to guide the adaptation of the texts. 18.2.1 The Restaurant Recommender using statistical methods Test 13b shows how to implement the Restaurant Recommender application using statistical methods. Following listing shows an excerpt of the code: Listing 37: The Restaurant Recommender using statistical TCBR methods p u b l i c CBRCaseBase p r e C y c l e ( ) throws E x e c u t i o n E x c e p t i o n { _caseBase . i n i t ( _connector ) ; / / Here we c r e a t e t h e L u c e n e i n d e x l u c e n e I n d e x = j c o l i b r i . method . p r e c y c l e . L u c e n e I n d e x C r e a t o r . createLuceneIndex ( _caseBase ) ; return _caseBase ; } p u b l i c v o i d c y c l e ( CBRQuery q u e r y ) throws E x e c u t i o n E x c e p t i o n { C o l l e c t i o n <CBRCase> c a s e s = _ c a s e B a s e . g e t C a s e s ( ) ; NNConfig n n C o n f i g = new NNConfig ( ) ; n n C o n f i g . s e t D e s c r i p t i o n S i m F u n c t i o n ( new A v e r a g e ( ) ) ; / / We o n l y compare t h e " d e s c r i p t i o n " a t t r i b u t e u s i n g L u c e n e A t t r i b u t e t e x t u a l A t t r i b u t e = new A t t r i b u t e ( " d e s c r i p t i o n " , RestaurantDescription . class ) ; n n C o n f i g . addMapping ( t e x t u a l A t t r i b u t e , new L u c e n e T e x t S i m i l a r i t y ( luceneIndex , query , t e x t u a l A t t r i b u t e , true ) ) ; C o l l e c t i o n < R e t r i e v a l R e s u l t > r e s = NNScoringMethod . e v a l u a t e S i m i l a r i t y ( cases , query , nnConfig ) ; r e s = S e l e c t C a s e s . selectTopKRR ( r e s , 5 ) ; / / Show t h e r e s u l t ... } Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 19. Evaluation of CBR applications 83 19 Evaluation of CBR applications jCOLIBRI 2 includes an extension for evaluating CBR applications. This functionality is found in the jcolibri.evaluation package and is composed by the following components: jcolibri.evaluation.Evaluator . This abstract class defines the common behavior of an evaluator. It is initialized with the CBRApplication to be evaluated and allows to access the EvaluationReport object that contains the result of the evaluation. This abstract class is extended by the following evaluators: jcolibri.evaluation.evaluators.LeaveOneOutEvaluator . This method uses all the cases as queries. It executes so many cycles as cases in the case base. In each cycle one case is used as query. jcolibri.evaluation.evaluators.HoldOutEvaluator . This method splits the case base in two sets: one used for testing where each case is used as query, and another that acts as normal case base. This process is performed serveral times. jcolibri.evaluation.evaluators.NFoldEvaluator . This evaluation method divides the case base into several random folds (indicated by the user). For each fold, their cases are used as queries and the remaining folds are used together as case base. This process is performed several times. jcolibri.evaluation.evaluators.SameSplitEvaluator . This method splits the case base in two sets: one used for testing where each case is used as query, and another that acts as normal case base. This method is different of the other evaluators beacuse the split is stored in a file that can be used in following evaluations. This way, the same set is used as queries for each evaluation. Developers can extend the Evaluator class to create their own evaluators. jcolibri.evaluation.EvaluationReport . This class stores the result of an evaluation. It is configured and filled by an Evaluator. This info is also used to represent graphically the result of an evaluation. The stored information can be: • Several series of data. The length of the series is the number of executed cycles. And several series can be stored using different labels. • Any other information. The put/getOtherData() methods allow to store any other kind of data. • Number of cycles. • Total evaluation time. • Time per cycle. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 19. Evaluation of CBR applications 84 jcolibri.evaluation.tools.EvaluationResultGUI .Class that visualizes the result of an evaluation in a Swing frame. It generates a chart with the evaluation result an other information returned by the evaluator. The evaluation of a CBR application is shown by the test 8 of the examples. The following code is an excerpt of the code: Listing 38: Evaluation code ... H o l d O u t E v a l u a t o r e v a l = new H o l d O u t E v a l u a t o r ( ) ; e v a l . i n i t ( new E v a l u a b l e A p p ( ) ) ; e v a l . HoldOut ( 5 , 1 ) ; System . o u t . p r i n t l n ( E v a l u a t o r . g e t E v a l u a t i o n R e p o r t ( ) ) ; j c o l i b r i . e v a l u a t i o n . t o o l s . E v a l u a t i o n R e s u l t G U I . show ( E v a l u a t o r . getEvaluationReport ( ) , " Test8 − Evaluation " , f a l s e ) ; ... As the listing shows, the HoldOutEvaluator is initialized with a common CBRApplication (EvaluableApp) through the init() method inherited from the Evaluator class. Then, the evaluator is invoked with the corresponding parameters. Finally, the EvaluationResultGUI class visualizes the result as shown in 28. Figure 28: Evaluation of a CBR application The application to be evaluated is in charge of storing the data in the EvaluationReport. Each time that the cycle() method of the evaluated application is invoked, this method must store the evaluation of that cycle into the report. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 19. Evaluation of CBR applications 85 For example, in the Test 8, we are evaluating the similarity value of the most similar case: Listing 39: Example of an evaluable application p u b l i c c l a s s E v a l u a b l e A p p implements S t a n d a r d C B R A p p l i c a t i o n { ... p u b l i c v o i d c y c l e ( CBRQuery q u e r y ) throws E x e c u t i o n E x c e p t i o n { NNConfig s i m C o n f i g = new NNConfig ( ) ; ... C o l l e c t i o n < R e t r i e v a l R e s u l t > e v a l = NNScoringMethod . e v a l u a t e S i m i l a r i t y ( _caseBase . g e t C a s e s ( ) , query , simConfig ) ; / / Now we add t h e s i m i l a r i t y o f t h e m o s t s i m i l a r c a s e to the serie " S i m i l a r i t y ". Evaluator . getEvaluationReport ( ) . addDataToSeries ( " S i m i l a r i t y " , new Double ( e v a l . i t e r a t o r ( ) . n e x t ( ) . getEval () ) ) ; } Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20. Recommenders 86 20 Recommenders jCOLIBRI 2 includes an extension to implement recommendation systems. This extension has been designed having in mind the future design process of CBR systems in jCOLIBRI. We proposes a flexible way to design CBR systems in future versions of jCOLIBRI using a library of templates obtained from a previously designed set of CBR systems. In case-based fashion, jCOLIBRI will retrieve templates from a library of templates (i.e. a case base of CBR design experience); the designer will choose one, and adapt it. We represent templates graphically as shown in following Figures 29 and 30. Each rectangle in the template is a subtask. Simple tasks (shown as blue or pale grey rectangles) can be solved directly by a method included in this extension. Complex tasks (shown as red or dark grey rectangles) are solved by decomposition methods having other associated templates. There may be multiple alternative methods to solve any given task. These methods are usual java methods of the classes in the framework. Before implementing this extension, we developed templates for recommender systems and then generated the methods that solve every task. That allowed us to develop many recommender systems that are included in the jcolibri.test.recommenders package. By now, templates are only a graphical representation of CBR systems, although in a short future they will be used to generate them. The jCOLIBRI team thanks to Derek Bridge his collaboration and supervision during the development of this extension. 20.1 Templates guided design of recommendation systems This section introduces the templates guided design of recommendation systems. Following text is an extract of [40]. In [40] we have done an analysis of recommender systems that is based in part on the conceptual framework described in the review paper by Bridge et al. [8]. The framework distinguishes between collaborative and case-based, reactive and proactive, single-shot and conversational, and asking and proposing. Within this framework, the authors review a selection of papers from the case-based recommender systems literature, covering the development of these systems over the last ten years. We take the systems’ interaction behaviour as the fundamental distinction from which we construct recommenders: • Single-Shot Systems make a suggestion and finish. Figure 29 shows the template for this kind of system, where One-Off Preference Elicitation and Retrieval are complex tasks that are solved by decomposition methods having other associated templates. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.1 Templates guided design of recommendation systems 87 Figure 29: Single Shot template • After retrieving items, Conversational Systems (Figure 30) may invite or allow the user to refine his/her current preferences, typically based on the recommended items. Iterated Preference Elicitation might be done by allowing the user to select and critique a recommended item thereby producing a modified query, which requires that one or more retrieved items be displayed (Figure 30 left). Alternatively, it might be done by asking the user a further question or questions thereby refining the query, in which case the retrieved items might be displayed every time (Figure 30 left) or might be displayed only when some criterion is satisfied (e.g. when the size of the set is “small enough”) (Figure 30 right). Note that both templates share the One-Off Preference Elicitation and Retrieval tasks with single-shot systems. 20.1.1 One-Off Preference Elicitation We can identify three templates by which the user’s initial preferences may be elicited: • One possibility is Profile Identification where the user identifies him/herself, e.g. by logging in, enabling retrieval of a user profile from a profile database. This profile might be a content-based profile (e.g. keywords describing the user’s longterm interests, or descriptions of items consumed previously by this user, or descriptions of sessions this user has engaged in previously with this system); or it might be a collaborative filtering style of profile (e.g. the user’s ratings for items). The template allows for the possibility that the user is a new user, in which case there may be some registration process followed by a complex task that elicits an initial profile. • An alternative is Initial Query Elicitation. This is itself a complex task with multiple alternative decompositions. The decompositions include: Form-Filling (Figure 31 left) and Navigation-by-Asking (i.e. choosing and asking a question) (Fig- Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.1 Templates guided design of recommendation systems 88 Figure 30: Conversational templates ure 31 center). Various versions of the Entrée system [11] offered interesting further methods: Identify an Item (where the user gives enough information to identify an item that s/he likes, e.g. a restaurant in his/her home town, whose description forms the basis of a query, e.g. for a restaurant in the town being visited); and Select an Exemplar (where a small set of contrasting items is selected and displayed, the user chooses the one most similar to what s/he is seeking, and its description forms the basis of a query). • The third possibility is Profile Identification & Query Elicitation, in which the previous two tasks are combined. Figure 31: Preference Elicitation decompositions Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.1 Templates guided design of recommendation systems 89 20.1.2 Retrieval Because we are focussing on case-based recommender systems (and related memorybased recommenders including collaborative filters), Retrieval is common to all our recommender systems. Retrieval is a complex task, with many alternative decompositions. The choice of decomposition is, of course, not independent of the choice of decomposition for One-Off Preference Elicitation and Iterated Preference Elicitation. For example, if One-Off Preference Elicitation delivers a ratings profile, then the method chosen for achieving the Retrieval task must be some form of collaborative recommendation. The following is a non-exhaustive list of papers that define methods that can achieve the Retrieval task: Wilke et al. 1998 [56] (similarity-based retrieval using a query of preferred values); Smyth & McClave 2001 [53] (diversity-enhanced similarity-based retrieval); McSherry 2002 [36] (diversity-conscious retrieval); Bridge & Fergsuon 2002 [7] (order-based retrieval); McSherry 2003 [37] (compromise-driven retrieval); Bradley & Smyth 2003 [6] (where user profiles are mined and used); Herlocker et al. 1999 [26] (user-based collaborative filtering); Sarwar et al. 2001 [47] (item-based collaborative filtering). In all these ways of achieving Retrieval, a scoring process is followed by a selection process. For example, in similarity-based retrieval (k-NN), items are scored by their similarity to the user’s preferences and then the k highest-scoring items are selected for display; in diversity-enhanced similarity-based retrieval, items are scored in the same way and then a diverse set is selected from the highest-scoring items; and so on. Note also that there are alternative decompositions of the Retrieval task that would not have this two-step character. For example, filter-based retrieval, where the user’s preferences are treated as hard constraints, conventionally does not decompose into two such steps. On the other hand, there are recommender systems in which Retrieval decomposes into more than two steps. For example, in some forms of Navigation-by-Proposing (see below), first a set of items that satisfy the user’s critique is obtained by filter-based retrieval, then these are scored for similarity to the user’s selected item, and finally a subset is chosen for display to the user. 20.1.3 Iterated Preference Elicitation In Iterated Preference Elicitation the user, who may or may not have just been shown some products (Figure 30), may, either voluntarily or at the system’s behest, provide further information about his/her preferences. Alternative decompositions of this task include: • Form-Filling where the user enters values into a form that usually has the same structure as items in the database (Figure 31 left). We have seen that Form-Filling is also used for One-Off Preference Elicitation. When it is used in Iterated Preference Elicitation, it is most likely that the user edits values s/he previously entered into the form. • Navigation-by-Asking is another method that can be used for both One-Off Preference Elicitation and for Iterated Preference Elicitation. The system refines the Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 90 query with the user’s answer to a question about his/her preferences. The system uses a heuristic to choose the next best question to ask. Bergmann [3] reviews some of the methods that have been used to choose this question. • Navigation-by-Proposing (also known as tweaking and as critiquing) requires that the user has been shown a set of candidate items. S/he selects the one that comes closest to satisfying his/her requirements but then offers a critique (e.g. “like this but cheaper”). A complex query is constructed that is intended to retrieve items that are similar to the selected item but which also satisfy the critique. The selection of the candidate item and its critiques must be performed during the Display Item List task. Therefore, the Create Complex Query task will receive that information and modify the query according to the user selection. Burke reviews early work on this topic [11]. We note that there has been a body of new work since then, some of it cited in [8] and [52]. (Although we describe Navigation-byProposing only as a decomposition of Iterated Preference Elicitation, this need not be so. We could additionally, for example, define a template for One-Off Preference Elicitation that uses Select an Exemplar followed by Navigation-byProposing.) Note that the templates for Conversational Systems do not preclude the possibility that the system uses a different method for Iterated Preference Elicitation on different iterations. ExpertClerk is an example of such a recommender. It uses Navigation-by-Asking for One-Off Preference Elicitation, and then it chooses between Navigation-by-Asking and Navigation-by-Proposing for Iterated Preference Elicitation, preferring the former while the set of candidate items remains large [51]. In fact, we have confirmed that we can build a version of ExpertClerk based on the template in Figure 30 right and using methods included in the recommenders extension. This informally illustrates the promise of our templates approach: complex systems (such as ExpertClerk) can be constructed by adapting existing templates. 20.2 Methods for recommender systems This section describes the new methods included in jCOLIBRI 2.1. There are several examples that illustrate how to use these methods in the jcolibri.test.recommenders package (read the documentation of the package for details). 20.2.1 User interaction methods ObtainQueryMethod : Obtains the query showing a form. There are two different versions: showing initial values (either default or from a previous iteration) and showing empty form. Optional parameters (both defined by the designer): Hidden Attributes . These attributes are not shown to the user. This is useful when using the McSherry’s similarity functions [37] that don’t take into Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 91 account the user preferences (see following subsection for details). Attributes labels . Labels to be shown when asking for an attribute. Is useful when using filter based similarity because instead asking for the price the designer may ask for the maximum price. DisplayCasesMethod : Shows the case in a panel. Useful when working with composed cases. This method returns a jcolibri.method.gui.casesDisplay.UserChoice object that encapsulates three different user answers: Quit, Buy and Refine Query. The Refine Query result is used in conversational systems that redefine the query (usually in Navigation by Proposing). It is completely useless in one-shot systems. So this method has three different display options: • In BASIC mode the dialog only shows the buy and quit buttons. • The REFINE_QUERY option returns a UserChoice.REFINE_QUERY value. • The SELECT_CASE option returns a UserChoice.REFINE_QUERY value together with a case selected from the list. DisplayCasesInTableMethod : Similar to the previous one. This method shows the cases in a table. This way is not very suitable for composed cases. UserChoice : Object that encapsulates the user answer when cases are shown. This object keeps an internal integer with posible values: QUIT, BUY and REFINE_QUERY. It also contains the chosen case from the list. If the answer is BUY, the selected case is the final result. If the answer is REFINE QUERY, the selected case can be used in Navigation by Proposing to elicit the query. The subclass CriticalUserChoice is an extension that also contains the critiques to the chosen case. 20.2.2 New Nearest Neighbor similarity measures In following formulas c.a represents an attribute of the case and q.a an attribute of the query. Inreca Less is Better : sim(c.a, q.a) = if (c.a < q.a) then 1 else jump ∗(max(a)−c.a)/(max(a)−q.a) Inreca More is Better : sim(c.a, q.a) = if (c.a > q.a) then 1 else jump ∗ (1 − (q.a − c.a)/q.a) McSherry Less is Better : (note that the user query value is ignored) sim(c.a, q.a) = (max(a) − c.a)/(max(a) − min(a)) Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 92 McSherry More is Better : (note that the user query value is ignored) sim(c.a, q.a) = 1 − (max(a) − c.a)/(max(a) − min(a)) Table : reads a table data from a csv file. Axes values must be strings or enumerations. 20.2.3 Conditional methods Continue : Receives an UserChoice and returns true if the value is Edit Query. BuyOrQuit : Receives an UserChoice object and returns true or false depending on its value (Quit or Buy). DisplayCasesIfNumber : Returns true if the number of cases received is inside a range. Optionally this method can show a message. Useful in conversational B systems when it is used with FilterBased retrieval (with k-NN it has no sense). DisplayCasesIfSimil : Returns true if the retrieved cases have a minimum similarity. Useful only with k-NN. 20.2.4 Navigation by Asking methods ObtainQueryWithAttributeQuestionMethod Asks the user for the value of an attribute. This method is used in Navigation by asking. It only shows the available options, removing the values that don’t appear in the working cases set. Customs labels are allowed. Information Gain Returns the attribute with more information gain in a set of cases [3][50]. Used in Navigation by Asking with ObtainQueryWithAttributeQuestionMethod. m X |C j | |C j | · log2 Gain(A) = − |C| |C| j=1 where C is partitioned according to the attribute A into m subsets C = C 1 ∪ · · · ∪ C m such that the attribute value of all cases in C j is vj . Similarity Influence Measure Selects the attribute that has the highest influence on the k-NN similarity. The influence on the similarity can be measured by the expected variance of the similarities of a set of selected cases. See [3][31][49] for details. X SimV ar(q, A, C) = pv · V ar(qA←v , c) v where: • v are the possible values of the attribute A. • qA←v is the query once assigned v to the attribute A. • pv = |C v |/|C|. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 1 • V ar(q, C) = |C| · for the cases in C. P c∈C (sim(q, c) 93 − µ)2 . where µ is the average similarity When using Attribute Selection and Filter based retrieval, the numerical attributes must be transformed into ranges, and the Filter based retrieval only uses EqualTo() predicates. 20.2.5 Navigation by Proposing DisplayCasesTableWithCritiquesMethod : Displays the cases in a table allowing the user to buy one item, finsh, or critique the selected item. Critiques are configured by the designer providing a list of emphCritiqueOptions. This method returns a CriticalUserChoice that is an extension of UserChoice storing also the critiques of the selected item. This method enables or disables the critiques depending on the values of the working cases. For example, it has no sense to show a “cheaper” button if there are not cheaper cases. Usually, displayed cases are the same than working cases, but when using diversity algorithms only three of the working cases are displayed. CritiqueOption : Object that encapsulates the possible critiques of a case. It stores the label of the button, the criticized attribute, and a Filter-Based Retrieval predicate to perform the critique. CriticalUserChoice : Extension of UserChoice that also stores the critiques over the selected case. In the Iterated preference elicitation of Navigation by Proposing there are several methods to modify the query (see [35]: More Like This replaces current query with the description of the selected case. Partial More Like This partially replaces current query with the description of the selected case. It only transfers a feature value from the selected case if none of the rejected cases have the same feature value. Weighted More Like This transfers all attributes from the selected case to the query but weights them given preference to diverse attributes among the proposed cases. The new weights are stored into a NNConfig object, so this strategy should be used with NN retrieval. Less Like This is a simple one: if all the rejected cases have the same featurevalue combination, which is different from the preferred case then this combination can be added as a negative condition. This negative condition is coded as a NotEqualTo(value) predicate in a FilterConfig object. The query is not modified. That way, this method should be used together with FilterBasedRetrieval. More + Less Like This combines both More Like This and Less Like This. It copies the values of the selected case into the query and returns a FilterConfig Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 94 object with the negative conditions. This method should be used together with Filtered and NN retrieval. 20.2.6 Profile management methods CreateProfile : Obtains a user profile (query) using the FormFilling method and stores it in a xml file. This method is not part of the CBR cycle. It is executed in the PreCycle or in a separate application. ObtainQueryFromProfile : Obtains the query form the xml file generated by CreateProfile. 20.2.7 Collaborative Recommendations These kind of recommendations are based on past experiences of other users. They need a specific case base implementation that manages cases composed by a description, a solution and a result. The description usually contains the information about the user that made the recommendation, the solution contains the information about the recommended item, and the result stores the rating value for the item (the value that the user assigns to an item after evaluating it). This way, this case base implementation stores cases as a table: Item1 Item2 rating12 User1 User2 rating21 User3 ... UserN ratingN2 Item3 Item4 rating14 Item5 ... ItemM rating23 rating33 rating34 rating2N rating3N ratinN5 ratingNN The values of the first column and row contain the ids of the description and solution components of the case. These ids must be integer values. The ratings are obtained from an attribute of the result component. Note that these cases base allows to have different cases with the same description (because each user can make several recommendations). The behavior of collaborative recommenders can be split into three steps [26]: 1. Weight all users with respect to similarity with the active user. 2. Select a subset of users as a set of predictors. 3. Normalize ratings and compute a prediction from a weighted combination of selected neighbors’ ratings. The case base implementation of the collaborative package performs the first step. The other two final steps are performed by the collaborative retrieval method. See [29] for further details. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 95 MatrixCaseBase : Specific implementation of CBRCaseBase to allow collaborative recommendations. As there are several ways to compute the first step of the behavior of collaborative recommenders. This class provides most of the code, but it must be specialized as in PearsonMatrixCaseBase. The subclasses must implement the abstract methods defined here. These methods return the similarity among neighbors. This similarity value is stored into SimilarityTuple objects. Ratings must be sorted by neighbors to allow an efficient comparison. Looking to the previous table this means that cases are organized by rows. This process is performed internally in this class. There are also two internal classes (CommonRatingTuple and CommonRatingsIterator) that allow to efficiently obtain the common ratings of two users. This will be used by subclasses when computing the neighbors similarity. The code of these classes is an adaptation of the one developed by Jerome Kelleher and Derek Bridge for the Collaborative Movie Recommender project at University College Cork (Ireland). PearsonMatrixCaseBase : Extension of the MatrixCaseBase that computes similarities among neighbors using the Pearson Correlation: Pm (ra,i − r̄a )(ru,i − r̄u ) s × sim(a, u) = i=1 σa σb f where: a and u are the compared neighbors, m the number of common items. r̄ denotes a mean value and σ denotes a standard derivation, and these are computed on co-rated items only. The Pearson correlation is weighted by a factor fs where s is the number of co-rated items and f is defined by the designer. This decreases the similarity between users who have fewer than f co-rated items. CollaborativeRetrievalMethod : This method returns cases depending on the recommendations of other users. It uses a PearsonMatrix case base to compute the similarity among neighbors. Then, cases are scored according to a rating that is estimated using the following formula: Pn (ru,i − r̄u )(sim(a, u)) pa,i = r̄a + u=1 Pn u=1 sim(a, u) where n is the number of users. 20.2.8 Retrieval methods Filter Based Retrieval : Retrieves cases which attributes comply some conditions. It computes the boolean AND operator over the condition of each attribute. The evaluation of each attribute is configured with predicates: • Equal. • NotEqual. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 96 • QueryLessOrEqual. • QueryLess. • QueryMoreOrEqual. • QueryMore. • OntologyCompatible: To use with instances. Returns true if the case instance is under the query instance. Expert Clerk Retrieval : This is the method of the ExpertClerk system [51]. This algorithm chooses the first case that is closed to the median of all cases. Then the remaining are selected taking into account negative and positive characteristics. A characteristic is an attribute that exceeds a predefined threshold with respect to the median case. It is positive if is greater than the value of the median. And negative otherwise. The number of positive plus the negative characteristics is used to rank the cases and obtain the retrieved cases. The first sample product (1st-SP) is the case closest to the median of the cases. Let C = c1 , c2 , ..., ck be the set of cases and let ci = vi1 , vi2 , ..., vin be the set of attribute values of a case ci . Then, the median cmed of C is calculated by: ! k k k 1X 1X 1X vj1 , vj2 , ..., vjn cmed = k j=1 k j=1 k j=1 The distance (D) between cmed and a case ci is given by: Pn j=1 Wj × d(cmed.j , ci.j)) Pn D(cmed , ci ) = j=1 Wj where cmed.j is the j-th attribute of cmed , vi.j) is the j-th attribute value of ci , and Wj is the j-th attribute weight. In case an attribute is an enumerative type, the most dominant value among attribute values is chosen as a median. 1st-SP is a record whose distance D(cmed , ci ) from cmed is the shortest among cj (1 ≤ j ≤ k). Then positive and negative characteristics of retrieved records are generated, and following cases are selected. For each retrieved record, attribute distance ADj between each attribute value and that of cmed is calculated by: ADj (ci .j, cmed .j) = Wj × (ci .j − cmed .j) If the absolute value of ADj (ci .j, cmed .j) exceeds a predefined threshold, ci .j is regarded as a characteristic of the record ci , and is called a positive characteristic if the value of ci .j is more highly ranked than that of cmed .j, otherwise it is called a negative characteristic. The total number of characteristics is the sum of positive characteristics and negative characteristic. The k-1 records having the maximum number of characteristics are also retrieved together with 1st-SP. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 97 20.2.9 Cases selection Select All or top-K cases These are the most basic methods that select all cases or only the k cases with highest similarity. They are implemented in jcolibri.method.retrieve.selection.SelectCases. Compromise Driven Selection Cases selection approach that chooses cases according to their compromises with the user’s query. The compromises are those attributes that do not satisfy the user’s requirement. That way cases are selected according to the number of compromises and their similarity. See [37] for details. Q: query, Candidates: cases CDR(Q, Candidates){ RS = {} while (|Candidates|>0) { C1 = first(Candidates); RS += {C}; like(C1,Q) += {C1}; covered(C1) += {C1}; for (C2 : Candidates){ if ( compromises(C1,Q) subset_or_equal compromises(C2,Q) ) covered(C1) += {C2}; if (compromises(C1,Q) equal compromises(C2,Q)) like(C1) += {C2}; } Candidates -= covered(C1); } return RS; } The += and -= operations are set addition and subtractions. The compromises set of a case is defined by: compromises(C, Q) = {a ∈ A : C.a f ails to satisf y the pref erence of the user} where A is the set of attributes of the case and C.a is the value of the attribute a for a case C. BoundedRandomSelection Is the fist algorithm proposed in [53]. It is the simplest diversity strategy: select the k cases at random from a larger set of the b·k most similar cases to the query. t: query, C: case-base, k: #results, b: bound BoundedRandomSelection(t,C,k,b){ C’ = bk cases in C that are most similar to t R = k random cases from C’ return R Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 20.2 Methods for recommender systems 98 } GreedySelection This method (see [53]) incrementally builds a retrieval set, R. During each step the remaining cases are ordered according to their quality with the highest quality case added to R. The quality metric combines diversity and similarity. The quality of a case c is proportional to the similarity between c and the query, and to the diversity of c relative to those cases so far selected in R = r1 , ..., rm . Quality(t, c, R) = Similarity(t, c) ∗ RelDiversity(c, R) RelDiversity(c, R) = 0 if R = {} Pm (1 − Similarity(c, ri )) , otherwise = i=1 m The pseudocode of the algorithm is: t: query, C: case-base, k: #results, b: bound GreedySelection(t,C,k){ R = {} for(i=1 to k){ Sort C by Quality(t,c,R) for each c in C R = R + First(C) C = C - First(C) } return R } This algorithm is very expensive. It should be applied to small case bases. BoundedGreedySelection Tries to reduce the complexity of the greedy selection algorithm first selecting the best bk cases according to their similarity to the query and then applies the greedy selection method to these cases (See [53]). t: query, C: case-base, k: #results, b: bound BoundedGreedySelection(t,C,k,b){ C’ = bk cases in C that are most similar to t R = {} for(i=1 to k){ Sort C’ by Quality(t,c,R) for each c in C’ R = R + First(C’) C’ = C’ - First(C’) } return R } Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 21. Other Features 99 21 Other Features jCOLIBRI 2 includes several other features implemented by external contributors. This section briefly describes these features. 21.1 Visualization of a Case Base Located in jcolibri.extensions.visualization.CasesVisualization, this class is a wrapper to the InfoVisual library develped by Josep Lluis Arcos (IIIACSIC). This visualization tool computes the distance among all cases and visualizes them according to that distance. Cases must be identified by a “class” tag used to identify the cases (assign the same color). This tag is obtained from the Id attribute of the solution of the cases. The visualization behavior is explained in the Test 9 of the examples. Following listing shows an excerpt of the code and Figure 32 is an snapshot of the visualization. Listing 40: Visualization code ... / / C o n f i g u r e c o n n e c t o r and c a s e b a s e C o n n e c t o r _ c o n n e c t o r = new P l a i n T e x t C o n n e c t o r ( ) ; _connector . initFromXMLfile ( j c o l i b r i . u t i l . FileIO . f i n d F i l e ( " j c o l i b r i / t e s t / t e s t 9 / p l a i n t e x t c o n f i g . xml " ) ) ; CBRCaseBase _ c a s e B a s e = new L i n e a l C a s e B a s e ( ) ; / / Load c a s e s _caseBase . i n i t ( _connector ) ; / / C o n f i g u r e NN NNConfig s i m C o n f i g = new NNConfig ( ) ; ... / / V i s u a l i z e the case base j c o l i b r i . extensions . visualization . CasesVisualization . v i s u a l i z e ( _caseBase . getCases ( ) , simConfig ) ; ... 21.2 Classification and Maintenance jCOLIBRI 2 includes several classification and maintenance methods developed by Lisa Cummins and Derek Bridge (University College Cork, Ireland). The classification methods are located in Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 21.2 Classification and Maintenance 100 Figure 32: Visualization of a Case Base the packages jcolibri.method.reuse.classification jcolibri.method.revise.classification. and Regarding the maintenance, the jcolibri.method.maintenance package includes methods that decide which cases should be removed from a case base to improve the accuracy of the CBR application. Besides these methods, there are other classes that evaluate the maintenance process. They are located in the jcolibri.extensions.maintenance_evaluation package. Read the documentation of those packages for details. Moreover, Tests 7, 14 and 15 show how to use these features. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 22. Getting support 101 22 Getting support The first place you should consult for support is the documentation supplied with jCOLIBRI 2 in the doc directory. There you will find the complete API documentation with descriptions of all classes and files in the framework. This folder also contains class and sequence UML diagrams of the most important components of jCOLIBRI 2. The other mayor source of information are the tests included in the framework. These tests serve as “programming recipes” and show how to use the components of jCOLIBRI 2 to implement CBR applications with different characteristics. If you cannot solve your problem/question with the provided documentation, the preferred way of getting support is the developers mailing list: http://sourceforge.net/mailarchive/forum.php?forum_name= jcolibri-cbr-developers You can post by sending your question to: jcolibri-cbr-developers@lists.sourceforge.net Finally, you can subscribe to the list and receive messages from other developers (do not worry, it is a low volume list): https://lists.sourceforge.net/lists/listinfo/ jcolibri-cbr-developers Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 23. Contributing to jCOLIBRI 102 23 Contributing to jCOLIBRI A contribution is a set of methods or classes that extend the functionality of the framework and are not bounded into the main release of jCOLIBRI. In earlier versions of the framework, contributions were called extensions and were distributed into the main release. However, the increasing interest of the community has driven us to create this new way for including third party code, keeping its own authorship and license. In this section you will find some simple steps to submit new contributions. 23.1 Required elements To submit a contribution you must define and include the following elements: • Name of your contribution. • Complete source code of your contribution. Source code should be documented according to the JavaDoc standard. Please, add the author’s name and affiliation in the beginning of each file. • The license file. We recommend to adopt LGPL as it is the jCOLIBRI’s license. In general terms, this license allows the commercial use of the software, although maintains the authorship of the code. Of course, you can define your custom license. • A text or html file with the credits of the contribution: authors, affiliation, etc. • An image file of your affiliation logo. • Contact or technical support e-mail. • Some example applications. We will compile and check the provided source code and generate the binary files. Documentation will be also generated into a separate folder named “doc”. This content will be packaged into a .jar file and published together with the license and credits files. Users will use the contribution by importing this .jar package into their projects. The credits and logo files will be used to add a record in the jCOLIBRI contributions web page at http://gaia.fdi.ucm.es/projects/jcolibri/jcolibri2/contributions.html. This record will include a link to download the contribution from the jCOLIBRI project at SourceForge.net. 23.2 Example applications jCOLIBRI includes an application named “jCOLIBRI2 Tester” (“Examples” in the Windows Menu) that shows how to use certain components or methods of the framework. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 23.3 How to submit a contribution 103 If you develop a new contribution, you should create some simple examples that will be incorporated into this application. An example is defined with a text file containing: • Name. • Short description. Allows html tags to decorate the text. • Class to execute. • Path to the most important documentation files (several lines). This information must be included into a text file where each example is separated by a line containing the <example> tag. Here there is an example (extracted from the examples.config file in jcolibri/test/main): Listing 41: Config file for the Examples application T e s t 1 − D a t a b a s e −kNN T e s t 1 shows how t o u s e a s i m p l e d a t a b a s e ( H i b e r n a t e ) ... j c o l i b r i . t e s t . t e s t 1 . Test1 doc / a p i / s r c −h t m l / j c o l i b r i / t e s t / t e s t 1 / T e s t 1 . h t m l doc / a p i / j c o l i b r i / t e s t / t e s t 1 / T e s t 1 . h t m l doc / a p i / j c o l i b r i / t e s t / t e s t 1 / T r a v e l D e s c r i p t i o n . h t m l doc / a p i / j c o l i b r i / c o n n e c t o r / D a t a B a s e C o n n e c t o r . h t m l doc / a p i / j c o l i b r i / method / r e t r i e v e / N N r e t r i e v a l / NNScoring . . . < example > Test 2 − Enumerated t y p e s T e s t 2 e x t e n d s T e s t 1 t o show t h e u s e o f e n u m e r a t e d . . . j c o l i b r i . t e s t . t e s t 2 . Test2 doc / a p i / s r c −h t m l / j c o l i b r i / t e s t / t e s t 2 / T e s t 2 . h t m l doc / a p i / j c o l i b r i / t e s t / t e s t 2 / T e s t 2 . h t m l doc / a p i / j c o l i b r i / t e s t / t e s t 2 / T r a v e l D e s c r i p t i o n . h t m l doc / a p i / j c o l i b r i / t e s t / t e s t 2 / M y S t r i n g T y p e . h t m l doc / a p i / j c o l i b r i / c o n n e c t o r / D a t a B a s e C o n n e c t o r . h t m l doc / a p i / j c o l i b r i / method / r e t r i e v e / N N r e t r i e v a l / NNScoring . . . < example > T e s t 3 − Compound a t t r i b u t e s ... 23.3 How to submit a contribution Please, pack everything into a zip file and send it to jcolibri [at] fdi.ucm.es. You can use that address to solve any question about the submission. Thank you for your interest and contribution to jCOLIBRI. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 24. Versions ChangeLog 104 24 Versions ChangeLog This section summarizes the main changes among versions. 24.1 Version 2.1 There is an important change in the Nearest Neighbour retrieval implementation. In version 2.0 there is a KNNRetreivalMethod class that receives a KNNConfig object. This configuration object contains the number of cases to be selected (the k value). As jCOLIBRI 2.1 includes several new selection algorithms, the k parameter has been removed from the KNNConfig object. This way, the Nearest Neighbor method always returns all cases (into RetrievalResult objects) and then, a selection algorithm must be executed. This change implies some important modifications of the names: version 2.0 version 2.1 jcolibri.retrieval.KNNretrieval jcolibri.retrieval.NNretrieval KNNConfig NNConfig KNNRetrievalMethod NNScoringMethod Some other classes have been reallocated into other packages. 24.1.1 What’s new in Version 2.1 Version 2.1 includes methods for developing recommendation systems. Some of the implemented methods can be used in general CBR applications and other are specific for recommender systems. General CBR methods: • Filtering Retrieval method. • XML utils to serialize cases and queries. • Methods to obtain the query graphically (using forms). • Methods to display cases. • Cases retrieval using diversity. • Cases selection using diversity. Specific methods for recommendation systems: • Methods to implement the Expert Clerk recommender. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial 24.2 Version 2.0 105 • Methods to implement collaborative recommenders. • Methods to obtain the query from user profiles. • Methods to implement Navigation by Asking recommenders. • Methods to implement Navigation by Proposing recommenders. • Local similarity measures for recommender systems. jCOLIBRI 2.1 includes 14 new examples that implement different recommenders. 24.2 Version 2.0 Main release of the framework and starting point of this changelog. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial References 106 References [1] A. Aamodt. Knowledge intensive case-based reasoning and sustained learning. In Proceedings of the ninth European Conference on Artificial Intelligence – (ECAI90), pages 1–6, August 1990. [2] A. Aamodt and E. Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(i), 1994. [3] R. Bergmann. Experience Management: Foundations, Development Methodology, and Internet-Based Applications. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2002. [4] R. Bergmann, S. Breen, E. Fayol, M. Göker, M. Manago, S. Schmitt, J. Schumacher, A. Stahl, S. Wess, and W. Wilke. Collecting experience on the systematic development of CBR applications using the INRECA methodology. Lecture Notes in Computer Science, 1488:460–470, 1998. [5] R. Bergmann, S. Breen, M. Goker, M. Manago, J. Schumacher, A. Stahl, E. Tartarin, S. Wess, and W. Wilke. The inreca-ii methodology for building and maintaining cbr applications, 1998. [6] K. Bradley and B. Smyth. Personalized information ordering: a case study in online recruitment. Knowledge-Based Systems, 16(5-6):269–275, 2003. [7] D. Bridge and A. Ferguson. An expressive query language for product recommender systems. Artif. Intell. Rev., 18(3-4):269–307, 2002. [8] D. Bridge, M. H. Göker, L. McGinty, and B. Smyth. Case-based recommender systems. Knowledge Engineering Review, 20(3):315–320, 2006. [9] M. Brown, C. Förtsch, and D. Wissmann. Feature extraction - the bridge from case-based reasoning to information retrieval. In Proceedings of 6th German Workshop on Case-Based Reasoning ’98 (GWCBR’98), 1998. [10] S. Brüninghaus and K. D. Ashley. The role of information extraction for textual CBR. In Proceedings of the 4th International Conference on Case-Based Reasoning, ICCBR ’01, pages 74–89. Springer-Verlag, 2001. [11] R. Burke. Interactive critiquing forcatalog navigation in e-commerce. Knowledge Engineering Review, 18(3-4):245–267, 2002. [12] E. by David Leake. Case Based Reasoning. Experiences, Lessons and Future Directions. AAAI Press. MIT Press, USA, 1997. [13] R. L. de Mantaras and E. Plaza. Case-based reasoning: An overview. AI Communications, 10(1), 1997. [14] S. J. Delany and D. Bridge. Feature-based and feature-free textual CBR: A comparison in spam filtering. In Procs. of the 17th Irish Conference on Artificial Intelligence and Cognitive Science, pages 244–253, Belfast, Northern Ireland, 2006. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial References 107 [15] S. J. Delany and D. Bridge. Catching the drift: Using feature-free case-based reasoning for spam filtering. In Procs. of the 7th International Conference on Case Based Reasoning, Belfast, Northern Ireland, 2007. [16] B. Díaz-Agudo and P. A. González-Calero. An architecture for knowledge intensive CBR systems. In E. Blanzieri and L. Portinale, editors, Advances in CaseBased Reasoning – (EWCBR’00). Springer-Verlag, Berlin Heidelberg New York, 2000. [17] B. Díaz-Agudo and P. A. González-Calero. Knowledge intensive CBR through ontologies. In Procs of the UK CBR Workshop. 2001. [18] B. Díaz-Agudo and P. A. González-Calero. CBROnto: a task/method ontology for CBR. In S. Haller and G. Simmons, editors, Procs. of the 15th International FLAIRS’02 Conference. AAAI Press, 2002. [19] B. Díaz-Agudo and P. A. González-Calero. Ontologies in the Context of Information Systems, chapter An ontological approach to develop Knowledge Intensive CBR systems, page 45. Springer-Verlag, 2006. [20] P. Funk and P. A. González-Calero, editors. Advances in Case-Based Reasoning, 7th European Conference, ECCBR 2004, Madrid, Spain, August 30 - September 2, 2004, Proceedings, volume 3155 of Lecture Notes in Computer Science. Springer, 2004. [21] H. Gomez-Gauchia, B. Díaz-Agudo, P. P. Gomez-Martin, and P. A. GonzálezCalero. Supporting conversation variability in cobber using causal loops. In CaseBased Reasoning Research and Development- Proc. of the ICCBR’05. Springer Verlag LNCS/LNAI, 2005. [22] P. A. González-Calero, M. Gómez-Albarrán, and B. Díaz-Agudo. Applying dls for retrieval in case-based reasoning. In Procs. of the 1999 Description Logics Workshop (Dl ’99). Linkopings universitet, Sweden, 1999. [23] P. A. González-Calero, M. Gómez-Albarrán, and B. Díaz-Agudo. A substitutionbased adaptation model. In Challenges for Case-Based Reasoning - Proc. of the ICCBR’99 Workshops. University of Kaiserslautern, 1999. [24] K. M. Gupta and D. W. Aha. Towards acquiring case indexing taxonomies from text. In Proceedings of the 17th Int. FLAIRS Conference, pages 307–315, Miami Beach, FL, 2004. AAAI Press. [25] E. Hatcher and O. Gospodnetic. Lucene in Action (In Action series). Manning Publications Co., Greenwich, CT, USA, 2004. [26] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. In SIGIR ’99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 230–237, New York, NY, USA, 1999. ACM. [27] M. Jaczynski and B. Trousse. An object-oriented framework for the design and Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial References 108 the implementation of case-based reasoners. In Proceedings of the 6th German Workshop on Case-Based Reasoning, 1998. [28] R. Johnson and B. Foote. Designing reusable classes. Journal of Object-Oriented Programming Volume 1, Number 2, 1988. [29] J. Kelleher and D. Bridge. An accurate and scalable collaborative recommender. Artificial Intelligence Review, 21(3-4):193–213, 2004. [30] E. Keogh, S. Lonardi, and C. Ratanamahatana. Towards parameter-free data mining. In Procs. of the 10th ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, pages 206–215, New York, NY, USA, 2004. [31] A. Kohlmaier, S. Schmitt, and R. Bergmann. A similarity-based approach to attribute selection in user-adaptive sales dialogs. In D. W. Aha and I. Watson, editors, Proceedings of the 4th International Conference on Case-Based Reasoning, pages 306–320, Seattle, Washington, 2001. Springer-Verlag. [32] J. Kolodner. Case-Based Reasoning. Morgan Kaufmann Publ., San Mateo, 1993. [33] M. Lenz. Defining knowledge layers for textual case-based reasoning. In Proceedings of the 4th European Workshop on Advances in Case-Based Reasoning, EWCBR-98, pages 298–309. Springer-Verlag, 1998. [34] M. Li, X. Chen, X. Li, B. Ma, and P. Vitanyi. The similarity metric. In Procs. of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 863–872, Baltimore, Maryland, 2003. [35] L. McGinty and B. Smyth. Comparison-based recommendation. In ECCBR ’02: Proceedings of the 6th European Conference on Advances in Case-Based Reasoning, pages 575–589, London, UK, 2002. Springer-Verlag. [36] D. McSherry. Diversity-conscious retrieval. In ECCBR ’02: Proceedings of the 6th European Conference on Advances in Case-Based Reasoning, pages 219–233, London, UK, 2002. Springer-Verlag. [37] D. McSherry. Similarity and compromise. In Case-Based Reasoning Research and Development, 5th International Conference on Case-Based Reasoning, ICCBR, pages 291–305, 2003. [38] A. Napoli, J. Lieber, and R. Courien. Classification-based problem solving in cbr. In I. Smith and B. Faltings, editors, Advances in Case-Based Reasoning – (EWCBR’96). Springer-Verlag, Berlin Heidelberg New York, 1996. [39] S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search results clustering algorithm based on singular value decomposition. In Intelligent Information Systems, pages 359–368, 2004. [40] J. A. Recio-García, B. Díaz-Agudo, D. Bridge, and P. A. González-Calero. Semantic templates for designing recommender systems. In M. Petridis, editor, Proceedings of the 12th UK Workshop on Case-Based Reasoning, pages 64–75. CMS Press, University of Greenwich, December 2007. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial References 109 [41] J. A. Recio-García, B. Díaz-Agudo, M. A. Gómez-Martín, and N. Wiratunga. Extending jCOLIBRI for textual CBR. In H. Muñoz-Avila and F. Ricci, editors, Proceedings of Case-Based Reasoning Research and Development, 6th International Conference on Case-Based Reasoning, ICCBR 2005, volume 3620 of Lecture Notes in Artificial Intelligence, subseries of LNCS, pages 421–435, Chicago, IL, US, August 2005. Springer. [42] J. A. Recio-García, B. Díaz-Agudo, and P. A. González-Calero. Textual CBR in jCOLIBRI: From Retrieval to Reuse. In D. C. Wilson and D. Khemani, editors, Workshop Proceedings of the 7th International Conference on Case-Based Reasoning (ICCBR’07), pages 217–226, Belfast, Northen Ireland, August 13-16 2007. [43] J. A. Recio-García, B. Díaz-Agudo, P. A. González-Calero, and A. Sánchez-RuizGranados. Ontology based cbr with jcolibri. In R. Ellis, T. Allen, and A. Tuson, editors, Applications and Innovations in Intelligent Systems XIV. Proceedings of AI-2006, the Twenty-sixth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pages 149–162, Cambridge, United Kingdom, December 2006. Springer. [44] J. A. Recio-García, B. Díaz-Agudo, A. Sánchez, and P. A. González-Calero. Lessons learnt in the development of a cbr framework. In M. Petridis, editor, Proccedings of the 11th UK Workshop on Case Based Reasoning, pages 60–71. CMS Press,University of Greenwich, 2006. [45] J. A. Recio-García, A. Sánchez, B. Díaz-Agudo, and P. A. González-Calero. jcolibri 1.0 in a nutshell. a software tool for designing cbr systems. In M. Petridis, editor, Proccedings of the 10th UK Workshop on Case Based Reasoning, pages 20–28. CMS Press,University of Greenwich, 2005. [46] S. Salotti and V. Ventos. Study and formalization of a case-based reasoning system using a description logic. In B. Smyth and P. Cunningham, editors, Advances in Case-Based Reasoning – (EWCBR’98). Springer-Verlag, 1998. [47] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In WWW ’01: Proceedings of the 10th international conference on World Wide Web, pages 285–295, New York, NY, USA, 2001. ACM. [48] R. C. Schank. Dynamic Memory. Cambridge Univ. Press, 1983. [49] S. Schmitt, P. Dopichaj, and P. Domínguez-Marín. Entropy-based vs. similarityinfluenced: Attribute selection methods for dialogs tested on different electronic commerce domains. In S. Craw and A. Preece, editors, Proceedings of the 6th European Conference on Case-Based Reasoning, pages 380–394, Aberdeen, Scotland, 2002. Springer-Verlag. [50] S. Schulz. CBR-works: A state-of-the-art shell for case-based application building. In E. Melis, editor, Proceedings of the 7th German Workshop on CaseBased Reasoning, GWCBR’99, Würzburg, Germany, pages 166–175. University of Würzburg, 1999. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial References 110 [51] H. Shimazu. ExpertClerk: A Conversational Case-Based Reasoning Tool for Developing Salesclerk Agents in E-Commerce Webshops. Artif. Intell. Rev., 18(34):223–244, 2002. [52] B. Smyth. Case-based recommendation. In P. Brusilovsky, A. Kobsa, and W. Nejdl, editors, The Adaptive Web, pages 342–376. Springer, 2007. [53] B. Smyth and P. McClave. Similarity vs. diversity. In ICCBR ’01: Proceedings of the 4th International Conference on Case-Based Reasoning, pages 347–361, London, UK, 2001. Springer-Verlag. [54] I. Watson. Applying case-based reasoning: techniques for enterprise systems. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998. [55] R. Weber, D. W. Aha, N. Sandhu, and H. Munoz-Avila. A textual case-based reasoning framework for knowledge management applications. In Proceedings of the 9th German Workshop on Case-Based Reasoning. Shaker Verlag., 2001. [56] W. Wilke, M. Lenz, and S. Wess. Intelligent sales support with cbr. In Case-Based Reasoning Technology, From Foundations to Applications, pages 91–114, London, UK, 1998. Springer-Verlag. [57] I. H. Witten and E. Frank. Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, USA, 2000. Group for Artificial Intelligence Applications jCOLIBRI2 Tutorial