Chapter 4. OSEM - XML

4.1. Introduction

Compass::Core provides the ability to map Java Objects to the underlying Search Engine through simple XML mapping files, we call this technology OSEM (Object Search Engine Mapping). OSEM provides a rich syntax for describing Object attributes and relationships. The OSEM files are used by Compass to extract the required property from the Object model at run-time and inserting the required meta-data into the Search Engine index.

4.2. Searchable Classes

Searchable classes are normally classes representing the state of the application, implementing the entities with the business model. Compass works best if the classes follow the simple Plain Old Java Object (POJO) programming model. The following class is an example of a searchable class:

import java.util.Date;
import java.util.Set;

public class Author {
   private Long id; // identifier
   private String name;
   private Date birthday;
   private Set books;

   private void setId(Long id) {
      this.id = id;
   }

   public Long getId() {
      return this.id;
   }

   public void setName(String name) {
      this.name = name;
   }

   public String getName() {
      return this.name;
   }

   public void setBirthday(Date birthday) {
      this.birthday = birthday;
   }

   public Date getBirthday() {
      return this.birtday;
   }

   public void setBooks(Set books) {
      this.books = books;
   }

   public Set getBooks() {
      return this.books;
   }

   // addBook not needed by Compass::Core
   public void addBook(Book book) {
      this.books.add(book);
   }
} 

Compass works non-intrusive with application Objects, these Objects must follow several rules:

4.2.1. Implement a Default Constructor

Author has an implicit default (no-argument) constructor. All persistent classes must have a default constructor (which may be non-public) so Compass::Core can instantiate using Constructor.newInstance().

4.2.2. Provide Property Identifier(s)

OSEM requires that any mapped Object will define one or more properties (JavaBean properties) that identifies the class. The id properties can be called anything, and it's type can be any primitive type, primitive "wrapper" type, java.lang.String or java.util.Date.

4.2.3. Declare Accessors and Mutators (Optional)

Even though Compass can directly persist instance variables, it is usually better to decouple this implementation detail from the Search Engine mechanism. Compass::Core recognizes JavaBean style property (getFoo, isFoo, and setFoo). This mechanism works with any level of visibility.

4.2.4. Implementing equals() and hashCode()

You have to override the equals() and hashCode() methods if you intend to mix objects of persistent classes (e.g. in a Set). You can implement it by using the identifier of both objects, but note that Compass::Core works best with surrogate identifier (and will provide a way to automatically generate them), thus it is best to implement the methods using business keys.

4.3. Mapping

Object/Search Engine mappings are defined in an XML document. The mapping language is Java centric, meaning that mappings are constructed around the classes themselves and not internal Resources. A possible OSEM file for the previous Author class example follows:

<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
    "-//Compass/Compass Core Mapping DTD 1.0//EN"
    "http://www.opensymphony.com/compass/dtd/compass-core-mapping.dtd">

<compass-core-mapping package="eg">

  <class name="Author" alias="author">

    <id name="id" />

    <constant>
      <meta-data>type</meta-data>
      <meta-data-value>person</meta-data-value>
      <meta-data-value>author</meta-data-value>
    </constant>

    <property name="name">
      <meta-data>name</meta-data>
      <meta-data>authorName</meta-data>
    </property>

    <property name="birthday">
      <meta-data>birthday</meta-data>
    </property>

    <component name="books" ref-alias="book" />

    <!-- can be a reference instead of component
    <reference name="books" ref-alias="book" />
    -->

  </class>

  <class name="Book" alias="book">

    ...

  </class>

</compass-core-mapping>

The above example defines the mapping for Author and Book classes. It introduces some key Compass mapping concepts and syntax. Before explaining the concepts, it is essential that the terminology used is clearly understood.

The first issue to address is the usage of the term Property. Because of its common usage as a concept in Java and Compass (to express Search Engine and Semantic terminology), special care has been taken to clearly prefix the meaning. A class Property refers to a Java class attribute. A Resource Property refers in Compass to Search Engine meta-data, which contains the values of the mapped class Property value. In previous OSEM example, the value of class Property "name" is mapped to two Resource Property instances called "name" and "authorname", each containing the value of the class Property "name".

The OSEM example above shows:

  • The unique class identifier, which maps to the "id" class property.

  • Constant meta data, a feature that allows Compass to insert extra meta data and values (not expressed in the Object). Compass::Core will save the Resource Property "type" with the specified values "person" and "author".

  • The mappings for the class Property "name" saved with two Resource Property called "name" and "authorName".

  • A dependency between Author and Book managed using a component mapping.

Each of these concepts are explained in detail in the following sections.

All XML mappings should declare the doctype shown. The actual DTD may be found at the URL above, or in the compass-core-x.x.x.jar. Compass will always look for the DTD in the classpath first.

4.3.1. compass-core-mapping

The main element which holds all the rest of the mappings definitions.

<compass-core-mapping package="packageName"/>
        

Table 4.1. 

AttributeDescription
package (optional)Specifies a package prefix for unqualified class names in the mapping document.

4.3.2. class

Declaring a searchable class using the class element.

<class
        name="className"
        alias="alias"
        sub-index="sub index name"
        analyzer="name of the analyzer"
        root="true|false"
        poly="false|true"
        poly-class="the class name that will be used to instantiate poly mapping (optional)"
        extends="a comma separated list of aliases to extend"
        boost="boost value for the class"
        all="true|false"
        all-term-vector="no|yes|positions|offsets|positios_offsets"
        all-metadata="all meta-data"
        all-analyzer="name of the analyzer used for the all property"
        converter="converter lookup name"
>
    (id)*,
    parent?,
    (analyzer?),
    (property|component|reference|constant)*
</class>

Table 4.2. 

AttributeDescription
nameThe fully qualified class name (or relative if the package is declared in compass-core-mapping).
aliasThe alias of the Resource that will be mapped to the class.
sub-index (optional, defaults to the alias value)The name of the sub-index that the alias will map to. When joining several searchable classes into the same index, the search will be much faster, but updates perform locks on the sub index level, so it might slow it down.
analyzer (optional, defaults to the default analyzer)The name of the analyzer that will be used to analyze TOKENIZED properties. Defaults to the default analyzer which is one of the internal analyzers that comes with Compass. Note, that when using the analyzer mapping (a child mapping of class mapping) (for a property value that controls the analyzer), the analyzer attribute will have no effects.
root (optional, defaults to true)Specifies if the class is a "root" class or not. You should define the searchable class with false if it only acts as mapping definitions for a component mapping.
poly (optional, defaults to false)Specifies if the class will be enabled to support polymorphism. This is the less preferable way to map an inheritance tree, since the extends attribute can be used to statically extend base classes or contracts.
poly-class (optional)If poly is set to true, the actual class name of the indexed object will be saved to the index as well (will be used later to instantiate the Object). If the poly-class is set, the class name will not be saved to the index, and the value of poly-class will be used to instantiate all the classes in the inheritance tree.
extends (optional)A comma separated list of aliases to extend. Can extend a class mapping or a contract mapping. Note that can extend more than one class/contract
boost (optional, defaults to 1.0)Specifies the boost level for the class.
all (optional, defaults to true)Controls if the searchable class will create it's own internal "all" meta-data. The "all" meta-data holds searchable information of all the class searchable content.
all-term-vector (optional, defaults to configuration setting compass.property.all.termVector)The term vector value of the all property.
all-metadata (optional, defaults to configuration setting compass.property.all)The name of the all property.
all-analyzer (optional, defaults to configuration setting compass.engine.all.analyzer, which in turn, defaults to the default analyzer)The name of the analyzer that will be used to analyze the all property.
converter (optional)The global converter lookup name registered with the configuration. Responsible for converting the ClassMapping definition. Defaults to compass internal ClassMappingConverter.

Root classes have their own index within the search engine index directory. Classes with a dependency to Root class, that don't require an index (i.e. component) should set root to false. You can control the sub-index that the root classes will map to using the sub-index attribute, otherwise it will create a sub-index based on the alias name.

If the class can be mapped to several classes (i.e. it is an interface or an abstract class), than set poly to true. This means Compass will persist the fully qualified class in the index.

You can set the boost level at the class level, which is applied to all class meta data (override by specifying at meta data level).

The class mapping can extend other class mappings (more than one), as well as contract mappings. All the mappings that are defined within the class mapping or the contract mapping will be inherited from the extended mappings. You can add any defined mappings by defining the same mappings in the class mappings, except for id mappings, which will be overridden. Note that any xml attributes (like root, sub-index, ...) that are defined within the extended mappings are not inherited.

The default behavior of the searchable class will support the "all" feature, which means that compass will create an "all" meta-data which represents all the other meta-data (with several exceptions, like Reader class property). The name of the "all" meta-data will default to the compass setting, but you can also set it using the all-metadata attribute.

4.3.3. contract

Declaring a searchable contract using the contract element.

<contract
        alias="alias"
>
    (id)*,
    (analyzer?),
    (property|component|reference|constant)*
</contract>

Table 4.3. 

AttributeDescription
aliasThe alias of the contract. Will be used as the alias name in the class mapping extended attribute

A contract acts as an interface in the Java language. You can define the same mappings within it that you can define in the class mapping, without defining the class that it will map to.

If you have several classes that have similar properties, you can define a contract that joins the properties definition, and than extend the contract within the mapped classes (even if you don't have a concrete interface or class in your Java definition).

4.3.4. id

Declaring a searchable id class property (a.k.a JavaBean property) of a class using the id element.

<id
      name="property name"
      accessor="property|field"
      boost="boost value for the class property"
      class="explicit declaration of the property class"
      managed-id="auto|true|false"
      exclude-from-all="false|true"
      converter="converter lookup name"
  >
 (meta-data)*
</id>

Table 4.4. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
accessor (optional, defaults to property)The strategy to access the class property value. property access using the Java Bean accessor methods, while field directly access the class fields.
boost (optional, default to 1.0f)The boost level that will be propagated to all the meta-data defined within the id.
class (optional)An explicit definition of the class of the property, helps for certain converters.
managed-id (optional, defaults to auto)The strategy for creating or using a class property meta-data id (which maps to a ResourceProperty).
exclude-from-all (optional, defaults to false)Excludes the class property from participating in the "all" meta-data, unless specified in the meta-data level.
converter (optional)The global converter lookup name registered with the configuration.

The id mapping is used to map the class property that identifies the class. You can define several id properties, even though we recommend using one. You can use the id mapping for all the Java primitive types (i.e. int), Java primitive wrapper types (i.e. Integer) and the String type.

Compass::Core requires that id and property mappings will be identifiable on the root class (Resource) level. Compass does that by either using one of the meta-data names (which is unique within ALL of the meta-data in the class mapping), or creating an internal one. Compass will create an internal one if no meta-data is defined in the id or property mapping. You can control it by using the managed-id, the value auto leaves the id assignment / creation as Compass's responsibility. Compass will analyze all the different meta-data defined in the mappings and will decide if it needs to create an internal id for an id or a property mapping. The true option will always create an internal id for the id or property and the false option will always take the first meta-data and use it as the id or property id.

4.3.5. property

Declaring a searchable class property (a.k.a JavaBean property) of a class using the property element.

<property
      name="property name"
      accessor="property|field"
      boost="boost value for the property"
      class="explicit declaration of the property class"
      analyzer="name of the analyzer"
      managed-id="auto|true|false"
      managed-id="[compass.managedId.index setting]|no|un_tokenized"
      exclude-from-all="false|true"
      converter="converter lookup name"
>
   (meta-data)*
</property>

Table 4.5. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
accessor (optional, defaults to property)The strategy to access the class property value. property means accessing using the Java Bean accessor methods, while field directly accesses the class fields.
boost (optional, default to 1.0f)The boost level that will be propagated to all the meta-data defined within the class property.
class (optional)An explicit definition of the class of the property, helps for certain converters (especially for java.util.Collection type properties, since it applies to the collection elements).
analyzer (optional, defaults to the class mapping analyzer decision scheme)The name of the analyzer that will be used to analyze TOKENIZED meta-data mappings defined for the given property. Defaults to the class mapping analyzer decision scheme based on the analyzer set, or the analyzer mapping property.
override (optional, defaults to true)If there is another definition with the same mapping name, if it will be overridden or added as additional mapping. Mainly used to override definitions made in extended mappings.
managed-id (optional, defaults to auto)The strategy for creating or using a class property meta-data id (which maps to a ResourceProperty.
managed-id-index (optional, defaults to compass.managedId.index setting, which defaults to no)Can be either un_tokenized or no. It is the index setting that will be used when creating an internal managed id for a class property mapping (if it is not a property id, if it is, it will always be un_tokenized).
exclude-from-all (optional, defaults to false)Excludes the class property from participating in the "all" meta-data, unless specified in the meta-data level.
converter (optional)The global converter lookup name registered with the configuration.

Compass::Core maps a class property to a set of meta-data (Resource Property).

You can map all internal Java primitive data types, primitive wrappers and most of the common Java classes (i.e. Date and Calendar). You can also map Arrays and Collections of these data types. When mapping a Collection, you must specify the object class (like java.lang.String) in the class mapping property.

The same rules for managed-id that apply for the id mapping, also applies for property mappings.

Note, that you can define a property with no meta-data mapping within it. It means that it will not be searchable, but the property value will be stored when persisting the object to the search engine, and it will be loaded from it as well (unless it is of type java.io.Reader).

4.3.6. analyzer

Declaring an analyzer controller property (a.k.a JavaBean property) of a class using the analyzer element.

<analyzer
      name="property name"
      null-analyzer="analyzer name if value is null"
      accessor="property|field"
      converter="converter lookup name"
>
</analyzer>

Table 4.6. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
accessor (optional, defaults to property)The strategy to access the class property value. property means accessing using the Java Bean accessor methods, while field directly accesses the class fields.
null-analyzer (optional, defaults to error in case of a null value)The name of the analyzer that will be used if the property has the null value.
converter (optional)The global converter lookup name registered with the configuration.

The analyzer class property mapping, controls the analyzer that will be used when indexing the class data (the underlying Resource). If the mapping is defined, it will override the class mapping analyzer attribute setting.

If, for example, Compass is configured to have two additional analyzers, called an1 (and have settings in the form of compass.engine.analyzer.an1.*), and another called an2. The values that the class property can hold are: default (which is an internal Compass analyzer, that can be configured as well), an1 and an2. If the analyzer will have a null value, and it is applicable with the application, than a null-analyzer can be configured that will be used in that case. If the class property has a value, but there is not matching analyzer, an exception will be thrown.

4.3.7. meta-data

Declaring and using the meta-data element.

<meta-data
      store="yes|no|compress"
      index="tokenized|un_tokenized|no"
      boost="boost value for the meta-data"
      analyzer="name of the analyzer"
      reverse="no|reader|string"
      exclude-from-all="[parent's exclude-from-all]|false|true"
      converter="converter lookup name"
      format="the format string (only applies to formatted elements)"
>
</meta-data>

Table 4.7. 

AttributeDescription
store (optional, defaults to yes)If the value of the class property that the meta-data maps to, is going to be stored in the index.
index (optional, defaults to tokenized)If the value of the class property that the meta-data maps to, is going to be indexed (searchable). If it does, than controls if the value is going to be broken down and analysed (tokenized), or is going to be used as is (un_tokenized).
boost (optional, defaults to 1.0f)Controls the boost level for the meta-data.
analyzer (optional, defaults to the parent analyzer)The name of the analyzer that will be used to analyze TOKENIZED meta-data. Defaults to the parent property mapping, which in turn defaults to the class mapping analyzer decision scheme based on the analyzer set, or the analyzer mapping property.
reverse (optional, defaults to no)The meta-data will have it's value reversed. Can have the values of no - no reverse will happen, string - the reverse will happen and the value stored will be a reversed string, and reader - a special reader will wrap the string and reverse it. The reader option is more perform ant, but the store and index settings will be discarded.
exclude-from-all (optional, defaults to the parent's exclude-from-all value)Excludes the meta-data from participating in the "all" meta-data.
converter (optional)The global converter lookup name registered with the configuration. Note, that in case of a Collection property, the converter will be applied to the collection elements (Compass has it's own converter for Collections).
format (optional)Allows for quickly setting a format for format-able types (dates, and numbers), without creating/registering a specialized converter under a lookup name.

The element meta-data is a Property within a Resource.

You can control the format of the marshalled values when mapping a java.lang.Number (or the equivalent primitive value) using the format provided by the java.text.DecimalFormat. You can also format a java.util.Date using the format provided by java.text.SimpleDateFormat. You set the format string in the format attribute.

4.3.8. component

Declaring and using the component element.

<component
      name="the class property name"
      ref-alias="name of the alias"
      max-depth="the depth of cyclic component mappings allowed"
      accessor="property|field"
      converter="converter lookup name"
>
</component>

Table 4.8. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
ref-alias (optional)The class mapping alias that defines the component. This is an optional attribute since under some conditions, compass can infer the correct reference alias.
max-depth (optional, defaults to 5)The depth of cyclic component mappings allowed.
override (optional, defaults to true)If there is another definition with the same mapping name, if it will be overridden or added as additional mapping. Mainly used to override definitions made in extended mappings.
accessor (optional, defaults to property)The strategy to access the class property value. property access using the Java Bean accessor methods, while field directly access the class fields.
converter (optional)The global converter lookup name registered with the configuration.

The component element defines a class dependency within the root class. The dependency name is identified by the ref-alias, which can be non-rootable or have no id mappings.

An embedded class means that all the mappings (meta-data values) defined in the referenced class are stored within the alias of the root class. It means that a search that will hit one of the component mapped meta-datas, will return it's owning class.

The type of the JavaBean property can be the class mapping class itself, an Array or Collection.

Support for cyclic mapping (from one component to it's parent class) is implemented using the parent mapping.

4.3.9. reference

Declaring and using the reference element.

<reference
        name="the class property name"
        ref-alias="name of the alias"
        ref-comp-alias="name of an optional alias mapped as component"
        accessor="property|field"
        converter="converter lookup name"
  >
</reference>

Table 4.9. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
ref-alias (optional)The class mapping alias that defines the reference. This is an optional attribute since under some conditions, compass can infer the correct reference alias.
ref-comp-alias (optional)The class mapping alias that defines a "shadow component". Will marshal a component like mapping based on the alias into the current class. Note, it's best to create a dedicated class mapping (with root="false") that only holds the required information. Based on the information, if you search for it, you will be able to get as part of your hits the encompassing class. Note as well, that when changing the referenced class, for it to be reflected as part of the ref-comp-alias you will have to save all the relevant encompassing classes.
accessor (optional, defaults to property)The strategy to access the class property value. property access using the Java Bean accessor methods, while field directly access the class fields.
converter (optional)The global converter lookup name registered with the configuration.

The reference element defines a "pointer" to a class dependency identified in ref-alias.

The type of the JavaBean property can be the class mapping class itself, an Array of it, or a Collection.

Currently there is no support for lazy behavior or cascading. It means that when saving an object, it will not persist the object defined references and when loading an object, it will load all it's references. Future versions will support lazy and cascading features.

Compass::Core supports cyclic references, which means that two classes can have a cyclic reference defined between them.

4.3.10. parent

Declaring and using the parent element.

<parent
        name="the class property name"
        accessor="property|field"
        converter="converter lookup name"
  >
</reference>

Table 4.10. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
accessor (optional, defaults to property)The strategy to access the class property value. property access using the Java Bean accessor methods, while field directly access the class fields.
converter (optional)The global converter lookup name registered with the configuration.

The parent mapping provides support for cyclic mappings for components. If the component class mapping wish to map the enclosing class, the parent mapping can be used to map to it. The parent mapping will not marshal (persist the data to the search engine) the parent object, it will only initialize it when loading the parent object from the search engine.

4.3.11. constant

Declaring a constant set of meta-data using the constant element.

<constant
          exclude-from-all="false|true"
          converter="converter lookup name"
    >
   meta-data,
   meta-data-value+
</reference>

Table 4.11. 

AttributeDescription
exclude-from-all (optional, defaults to false)Excludes the constant meta-data and all it's values from participating in the "all" feature.
override (optional, defaults to true)If there is another definition with the same mapping name, if it will be overridden or added as additional mapping. Mainly used to override definitions made in extended mappings.
converter (optional)The global converter lookup name registered with the configuration.

If you wish to define a set of constant meta data that will be embedded within the searchable class (Resource), you can use the constant element. You define the usual meta-data element followed by one or moremeta-data-value elements with the value that maps to the meta-data within it.