Hibernate Search
Hibernate Search
Reference Guide
3.1.1.GA
Hibernate Search
Preface .......................................................................................................... v
1. Getting started ........................................................................................ 1
1.1. System Requirements ................................................................... 2
1.2. Using Maven .................................................................................. 3
1.3. Configuration ................................................................................. 5
1.4. Indexing ....................................................................................... 10
1.5. Searching ..................................................................................... 11
1.6. Analyzer ....................................................................................... 12
1.7. What's next .................................................................................. 15
2. Architecture .......................................................................................... 17
2.1. Overview ...................................................................................... 17
2.2. Back end ..................................................................................... 18
2.2.1. Back end types ................................................................. 18
2.2.1.1. Lucene ................................................................... 18
2.2.1.2. JMS ........................................................................ 19
2.2.2. Work execution ................................................................. 20
2.2.2.1. Synchronous .......................................................... 20
2.2.2.2. Asynchronous ......................................................... 20
2.3. Reader strategy ........................................................................... 20
2.3.1. Shared .............................................................................. 20
2.3.2. Not-shared ........................................................................ 21
2.3.3. Custom .............................................................................. 21
3. Configuration ........................................................................................ 23
3.1. Directory configuration ................................................................. 23
3.2. Sharding indexes ......................................................................... 25
3.3. Sharing indexes (two entities into the same directory) ................. 27
3.4. Worker configuration .................................................................... 28
3.5. JMS Master/Slave configuration .................................................. 29
3.5.1. Slave nodes ...................................................................... 29
3.5.2. Master node ...................................................................... 30
3.6. Reader strategy configuration ...................................................... 32
3.7. Enabling Hibernate Search and automatic indexing .................... 33
3.7.1. Enabling Hibernate Search ............................................... 33
3.7.2. Automatic indexing ............................................................ 34
3.8. Tuning Lucene indexing performance .......................................... 35
4. Mapping entities to the index structure ............................................. 39
4.1. Mapping an entity ........................................................................ 39
4.1.1. Basic mapping .................................................................. 39
4.1.2. Mapping properties multiple times .................................... 42
4.1.3. Embedded and associated objects ................................... 42
4.1.4. Boost factor ....................................................................... 47
4.1.5. Analyzer ............................................................................ 48
4.1.5.1. Analyzer definitions ................................................ 49
iv Hibernate 3.1.1.GA
Preface
Full text search engines like Apache Lucene are very powerful technologies
to add efficient free text search capabilities to applications. However,
Lucene suffers several mismatches when dealing with object domain model.
Amongst other things indexes have to be kept up to date and mismatches
between index structure and domain model as well as query mismatches
have to be avoided.
Hibernate 3.1.1.GA v
vi Hibernate 3.1.1.GA
Chapter 1. Getting started
Welcome to Hibernate Search! The following chapter will guide you through
the initial steps required to integrate Hibernate Search into an existing
Hibernate enabled application. In case you are a Hibernate new timer we
recommend you start here [http://hibernate.org/152.html].
Hibernate 3.1.1.GA 1
Chapter 1. Getting started
2 Hibernate 3.1.1.GA
Using Maven
<repository>
<id>repository.jboss.org</id>
<name>JBoss Maven Repository</name>
<url>http://repository.jboss.org/maven2</url>
<layout>default</layout>
</repository>
Hibernate 3.1.1.GA 3
Chapter 1. Getting started
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search</artifactId>
<version>3.1.1.GA</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-annotations</artifactId>
<version>3.4.0.GA</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>3.4.0.GA</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-common</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-core</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-snowball</artifactId>
<version>2.4.1</version>
</dependency>
4 Hibernate 3.1.1.GA
Configuration
1.3. Configuration
Once you have downloaded and added all required dependencies to
your application you have to add a couple of properties to your hibernate
configuration file. If you are using Hibernate directly this can be done in
hibernate.properties or hibernate.cfg.xml. If you are using Hibernate via
JPA you can also add the properties to persistence.xml. The good news is
that for standard use most properties offer a sensible default. An example
persistence.xml configuration could look like this:
...
<property name="hibernate.search.default.directory_provider"
value="org.hibernate.search.store.FSDirectoryProvider"/>
<property name="hibernate.search.default.indexBase"
value="/var/lucene/indexes"/>
...
Lets assume that your application contains the Hibernate managed classes
example.Book and example.Author and you want to add free text search
capabilities to your application in order to search the books contained in your
database.
Hibernate 3.1.1.GA 5
Chapter 1. Getting started
package example;
...
@Entity
public class Book {
@Id
@GeneratedValue
private Integer id;
@ManyToMany
private Set<Author> authors = new HashSet<Author>();
public Book() {
}
package example;
...
@Entity
public class Author {
@Id
@GeneratedValue
private Integer id;
public Author() {
}
To achieve this you have to add a few annotations to the Book and Author
class. The first annotation @Indexed marks Book as indexable. By design
6 Hibernate 3.1.1.GA
Configuration
Next you have to mark the fields you want to make searchable. Let's start
with title and subtitle and annotate both with @Field. The parameter
index=Index.TOKENIZED will ensure that the text will be tokenized using the
default Lucene analyzer. Usually, tokenizing means chunking a sentence into
individual words and potentially excluding common words like 'a' or 'the'.
We will talk more about analyzers a little later on. The second parameter
we specify within @Field, store=Store.NO, ensures that the actual data will
not be stored in the index. Whether this data is stored in the index or not
has nothing to do with the ability to search for it. From Lucene's perspective
it is not necessary to keep the data once the index is created. The benefit
of storing it is the ability to retrieve it via projections (Section 5.1.2.5,
“Projection”).
After this short look under the hood let's go back to annotating the Book
class. Another annotation we have not yet discussed is @DateBridge. This
annotation is one of the built-in field bridges in Hibernate Search. The Lucene
index is purely string based. For this reason Hibernate Search must convert
the data types of the indexed fields to strings and vice versa. A range of
predefined bridges are provided, including the DateBridge which will convert
a java.util.Date into a String with the specified resolution. For more details
see Section 4.2, “Property/Field Bridge”.
Hibernate 3.1.1.GA 7
Chapter 1. Getting started
These settings should be sufficient for now. For more details on entity
mapping refer to Section 4.1, “Mapping an entity”.
8 Hibernate 3.1.1.GA
Configuration
package example;
...
@Entity
@Indexed
public class Book {
@Id
@GeneratedValue
@DocumentId
private Integer id;
@Field(index=Index.TOKENIZED, store=Store.NO)
private String title;
@Field(index=Index.TOKENIZED, store=Store.NO)
private String subtitle;
@IndexedEmbedded
@ManyToMany
private Set<Author> authors = new HashSet<Author>();
public Book() {
}
package example;
...
@Entity
public class Author {
@Id
@GeneratedValue
private Integer id;
@Field(index=Index.TOKENIZED, store=Store.NO)
private String name;
public Author() {
}
1.4. Indexing
Hibernate Search will transparently index every entity persisted, updated
or removed through Hibernate Core. However, you have to trigger an initial
indexing to populate the Lucene index with the data already present in your
database. Once you have added the above properties and annotations it
is time to trigger an initial batch index of your books. You can achieve this
by using one of the following code snippets (see also Chapter 6, Manual
indexing):
FullTextSession fullTextSession =
Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
EntityManager em = entityManagerFactory.createEntityManager();
FullTextEntityManager fullTextEntityManager =
Search.getFullTextEntityManager(em);
em.getTransaction().begin();
em.getTransaction().commit();
em.close();
After executing the above code, you should be able to see a Lucene index
under /var/lucene/indexes/example.Book. Go ahead an inspect this index
with Luke [http://www.getopt.org/luke/]. It will help you to understand how
Hibernate Search works.
10 Hibernate 3.1.1.GA
Searching
1.5. Searching
Now it is time to execute a first search. The general approach is to create a
native Lucene query and then wrap this query into a org.hibernate.Query in
order to get all the functionality one is used to from the Hibernate API. The
following code will prepare a query against the indexed fields, execute it and
return a list of Books.
FullTextSession fullTextSession =
Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
// execute search
List result = hibQuery.list();
tx.commit();
session.close();
Hibernate 3.1.1.GA 11
Chapter 1. Getting started
EntityManager em = entityManagerFactory.createEntityManager();
FullTextEntityManager fullTextEntityManager =
org.hibernate.hibernate.search.jpa.Search.getFullTextEntityManager(em);
em.getTransaction().begin();
// execute search
List result = persistenceQuery.getResultList();
em.getTransaction().commit();
em.close();
1.6. Analyzer
Let's make things a little more interesting now. Assume that one of your
indexed book entities has the title "Refactoring: Improving the Design of
Existing Code" and you want to get hits for all of the following queries:
"refactor", "refactors", "refactored" and "refactoring". In Lucene this can be
achieved by choosing an analyzer class which applies word stemming during
the indexing as well as search process. Hibernate Search offers several
ways to configure the analyzer to use (see Section 4.1.5, “Analyzer”):
When using the @Analyzer annotation one can either specify the fully
qualified classname of the analyzer to use or one can refer to an analyzer
definition defined by the @AnalyzerDef annotation. In the latter case the
12 Hibernate 3.1.1.GA
Analyzer
Generally, when using the Solr framework you have to start with a tokenizer
followed by an arbitrary number of filters.
Hibernate 3.1.1.GA 13
Chapter 1. Getting started
package example;
...
@Entity
@Indexed
@AnalyzerDef(name = "customanalyzer",
tokenizer = @TokenizerDef(factory =
StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = SnowballPorterFilterFactory.class,
params = {
@Parameter(name = "language", value = "English")
})
})
public class Book {
@Id
@GeneratedValue
@DocumentId
private Integer id;
@Field(index=Index.TOKENIZED, store=Store.NO)
@Analyzer(definition = "customanalyzer")
private String title;
@Field(index=Index.TOKENIZED, store=Store.NO)
@Analyzer(definition = "customanalyzer")
private String subtitle;
@IndexedEmbedded
@ManyToMany
private Set<Author> authors = new HashSet<Author>();
public Book() {
}
14 Hibernate 3.1.1.GA
What's next
mvn archetype:create \
-DarchetypeGroupId=org.hibernate \
-DarchetypeArtifactId=hibernate-search-quickstart \
-DarchetypeVersion=3.1.1.GA \
-DgroupId=my.company -DartifactId=quickstart
Using the maven project you can execute the examples, inspect the file
system based index and search and retrieve a list of managed objects. Just
run mvn package to compile the sources and run the unit tests.
The next step after this tutorial is to get more familiar with the overall
architecture of Hibernate Search (Chapter 2, Architecture) and explore the
basic features in more detail. Two topics which were only briefly touched
in this tutorial were analyzer configuration (Section 4.1.5, “Analyzer”) and
field bridges (Section 4.2, “Property/Field Bridge”), both important features
required for more fine-grained indexing. More advanced topics cover
clustering (Section 3.5, “JMS Master/Slave configuration”) and large indexes
handling (Section 3.2, “Sharding indexes”).
Hibernate 3.1.1.GA 15
16 Hibernate 3.1.1.GA
Chapter 2. Architecture
2.1. Overview
Hibernate Search consists of an indexing component and an index search
component. Both are backed by Apache Lucene.
To interact with Apache Lucene indexes, Hibernate Search has the notion
of DirectoryProviders. A directory provider will manage a given Lucene
Directory type. You can configure directory providers to adjust the directory
target (see Section 3.1, “Directory configuration”).
Hibernate Search uses the Lucene index to search an entity and return a
list of managed entities saving you the tedious object to Lucene document
mapping. The same persistence context is shared between Hibernate and
Hibernate Search. As a matter of fact, the FullTextSession is built on top
of the Hibernate Session. so that the application code can use the unified
org.hibernate.Query or javax.persistence.Query APIs exactly the way a
HQL, JPA-QL or native queries would do.
• ACIDity: The work executed has the same scoping as the one executed by
the database transaction and is executed if and only if the transaction is
committed. This is not ACID in the strict sense of it, but ACID behavior is
Hibernate 3.1.1.GA 17
Chapter 2. Architecture
rarely useful for full text search indexes since they can be rebuilt from the
source at any time.
You can think of those two scopes (no scope vs transactional) as the
equivalent of the (infamous) autocommit vs transactional behavior. From
a performance perspective, the in transaction mode is recommended.
The scoping choice is made transparently. Hibernate Search detects the
presence of a transaction and adjust the scoping.
Note
Hibernate Search works perfectly fine in the Hibernate /
EntityManager long conversation pattern aka. atomic conversation.
Note
Depending on user demand, additional scoping will be considered,
the pluggability mechanism being already in place.
2.2.1.1. Lucene
In this mode, all index update operations applied on a given node (JVM) will
be executed to the Lucene directories (through the directory providers) by the
same node. This mode is typically used in non clustered environment or in
clustered environments where the directory store is shared.
18 Hibernate 3.1.1.GA
Back end types
2.2.1.2. JMS
All index update operations applied on a given node are sent to a JMS
queue. A unique reader will then process the queue and update the master
index. The master index is then replicated on a regular basis to the slave
copies. This is known as the master/slaves pattern. The master is the sole
responsible for updating the Lucene index. The slaves can accept read as
well as write operations. However, they only process the read operation on
their local index copy and delegate the update operations to the master.
Hibernate 3.1.1.GA 19
Chapter 2. Architecture
Note
Hibernate Search is an extensible architecture. Feel free to drop ideas
for other third party back ends to [email protected].
2.2.2.1. Synchronous
This is the safe mode where the back end work is executed in concert with
the transaction commit. Under highly concurrent environment, this can lead
to throughput limitations (due to the Apache Lucene lock mechanism) and it
can increase the system response time if the backend is significantly slower
than the transactional process and if a lot of IO operations are involved.
2.2.2.2. Asynchronous
This mode delegates the work done by the back end to a different
thread. That way, throughput and response time are (to a certain extend)
decorrelated from the back end performance. The drawback is that a small
delay appears between the transaction commit and the index update and a
small overhead is introduced to deal with thread management.
2.3.1. Shared
With this strategy, Hibernate Search will share the same IndexReader, for
a given Lucene index, across multiple queries and threads provided that
the IndexReader is still up-to-date. If the IndexReader is not up-to-date, a
new one is opened and provided. Each IndexReader is made of several
20 Hibernate 3.1.1.GA
Not-shared
2.3.2. Not-shared
Every time a query is executed, a Lucene IndexReader is opened. This
strategy is not the most efficient since opening and warming up an
IndexReader can be a relatively expensive operation.
2.3.3. Custom
You can write your own reader strategy that suits your application needs
by implementing org.hibernate.search.reader.ReaderProvider. The
implementation must be thread safe.
Hibernate 3.1.1.GA 21
22 Hibernate 3.1.1.GA
Chapter 3. Configuration
3.1. Directory configuration
Apache Lucene has a notion of Directory to store the index files. The
Directory implementation can be customized, but Lucene comes
bundled with a file system (FSDirectoryProvider) and an in memory
(RAMDirectoryProvider) implementation. DirectoryProviders are the
Hibernate Search abstraction around a Lucene Directory and handle the
configuration and the initialization of the underlying Lucene resources.
Table 3.1, “List of built-in Directory Providers” shows the list of the directory
providers bundled with Hibernate Search.
Hibernate 3.1.1.GA 23
your operating system
and available RAM;
most people reported
Chapter 3. Configuration good results using
values between 16 and
64MB.
Table 3.1. List of built-in Directory
File system basedProviders
org.hibernate.search.store.FSSlaveDirectoryProvider
indexBase: Base
directory. Like directory
FSDirectoryProvider,
but retrieves a master indexName: override
version (source) on a @Indexed.index (useful
regular basis. To avoid for sharded indexes)
locking and inconsistent
sourceBase:Source
search results, 2 local
(copy) base directory.
copies are kept.
source: Source directory
The recommended
suffix (default to
value for the refresh
@Indexed.index).
period is (at least) 50%
The actual source
higher that the time to
directory name being
copy the information
<sourceBase>/<source>
(default 3600 seconds -
60 minutes). refresh:
refresh period
in second (the copy will
Note that the copy is
take place every refresh
based on an incremental
seconds).
copy mechanism
reducing the average buffer_size_on_copy:
copy time. The amount of
MegaBytes to move in
DirectoryProvider
a single low level copy
typically used on slave
instruction; defaults to
nodes using a JMS back
16MB.
end.
The
buffer_size_on_copy
optimum depends on
your operating system
and available RAM;
most people reported
good results using
values between 16 and
64MB.
org.hibernate.search.store.RAMDirectoryProvider
Memory based none
directory, the directory
will be uniquely
identified (in the same
deployment unit) by the
@Indexed.index element
24 Hibernate 3.1.1.GA
Sharding indexes
hibernate.search.default.directory_provider
org.hibernate.search.store.FSDirectoryProvider
hibernate.search.default.indexBase=/usr/lucene/indexes
hibernate.search.Rules.directory_provider
org.hibernate.search.store.RAMDirectoryProvider
applied on
@Indexed(index="Status")
public class Status { ... }
@Indexed(index="Rules")
public class Rule { ... }
You can easily define common rules like the directory provider and base
directory, and override those defaults later on on a per index basis.
Hibernate 3.1.1.GA 25
Chapter 3. Configuration
sizes and index update times are slowing the application down. The main
drawback of index sharding is that searches will end up being slower since
more files have to be opened for a single search. In other words don't do it
until you have problems :)
Despite this strong warning, Hibernate Search allows you to index a given
entity type into several sub indexes. Data is sharded into the different sub
indexes thanks to an IndexShardingStrategy. By default, no sharding strategy
is enabled, unless the number of shards is configured. To configure the
number of shards use the following property
hibernate.search.<indexName>.sharding_strategy.nbr_of_shards 5
The default sharding strategy, when shards are set up, splits the data
according to the hash value of the id string representation (generated by the
Field Bridge). This ensures a fairly balanced sharding. You can replace the
strategy by implementing IndexShardingStrategy and by setting the following
property
hibernate.search.<indexName>.sharding_strategy
my.shardingstrategy.Implementation
hibernate.search.default.indexBase /usr/lucene/indexes
hibernate.search.Animal.sharding_strategy.nbr_of_shards 5
hibernate.search.Animal.directory_provider
org.hibernate.search.store.FSDirectoryProvider
hibernate.search.Animal.0.indexName Animal00
hibernate.search.Animal.3.indexBase /usr/lucene/sharded
hibernate.search.Animal.3.indexName Animal03
26 Hibernate 3.1.1.GA
Sharing indexes (two entities into the same
directory)
This configuration uses the default id string hashing strategy and shards the
Animal index into 5 subindexes. All subindexes are FSDirectoryProvider
instances and the directory where each subindex is stored is as followed:
This is only presented here so that you know the option is available.
There is really not much benefit in sharing indexes.
It is technically possible to store the information of more than one entity into a
single Lucene index. There are two ways to accomplish this:
hibernate.search.org.hibernate.search.test.shards.Furniture.indexName = Animal
hibernate.search.org.hibernate.search.test.shards.Animal.indexName = Animal
• Setting the @Indexed annotation’s index attribute of the entities you want
to merge to the same value. If we again wanted all Furniture instances to
be indexed in the Animal index along with all instances of Animal we would
specify @Indexed(index=”Animal”) on both Animal and Furniture classes.
Hibernate 3.1.1.GA 27
Chapter 3. Configuration
You can define the worker configuration using the following properties
Property Description
hibernate.search.worker.backend Out of the box support for the Apache
Lucene back end and the JMS back
end. Default to lucene. Supports also
jms.
Defines
hibernate.search.worker.thread_pool.size the number of threads in the
pool. useful only for asynchronous
execution. Default to 1.
Defines
hibernate.search.worker.buffer_queue.max the maximal number of
work queue if the thread poll is
starved. Useful only for asynchronous
execution. Default to infinite. If the
limit is reached, the work is done by
the main thread.
hibernate.search.worker.jndi.* Defines the JNDI properties to initiate
the InitialContext (if needed). JNDI is
only used by the JMS back end.
Mandatory
hibernate.search.worker.jms.connection_factory for the JMS back end.
Defines the JNDI name to lookup
the JMS connection factory from
(/ConnectionFactory by default in
JBoss AS)
hibernate.search.worker.jms.queue Mandatory for the JMS back end.
Defines the JNDI name to lookup the
JMS queue from. The queue will be
used to post work messages.
28 Hibernate 3.1.1.GA
JMS Master/Slave configuration
Hibernate 3.1.1.GA 29
Chapter 3. Configuration
## DirectoryProvider
# (remote) master location
hibernate.search.default.sourceBase =
/mnt/mastervolume/lucenedirs/mastercopy
## Backend configuration
hibernate.search.worker.backend = jms
hibernate.search.worker.jms.connection_factory = /ConnectionFactory
hibernate.search.worker.jms.queue = queue/hibernatesearch
#optional jndi configuration (check your JMS provider for more
information)
The refresh period should be higher that the expected time copy.
30 Hibernate 3.1.1.GA
Master node
## DirectoryProvider
# (remote) master location where information is copied to
hibernate.search.default.sourceBase =
/mnt/mastervolume/lucenedirs/mastercopy
## Backend configuration
#Backend is the default lucene one
The refresh period should be higher that the expected time copy.
Hibernate 3.1.1.GA 31
Chapter 3. Configuration
@MessageDriven(activationConfig = {
@ActivationConfigProperty(propertyName="destinationType",
propertyValue="javax.jms.Queue"),
@ActivationConfigProperty(propertyName="destination",
propertyValue="queue/hibernatesearch"),
@ActivationConfigProperty(propertyName="DLQMaxResent",
propertyValue="1")
} )
public class MDBSearchController extends
AbstractJMSHibernateSearchController implements MessageListener {
@PersistenceContext EntityManager em;
• shared: share index readers across several queries. This strategy is the
most efficient.
hibernate.search.reader.strategy = not-shared
32 Hibernate 3.1.1.GA
Enabling Hibernate Search and automatic
indexing
Or if you have a custom reader strategy:
hibernate.search.reader.strategy =
my.corp.myapp.CustomReaderProvider
To enable Hibernate Search in Hibernate Core (ie. if you don't use Hibernate
Annotations), add the FullTextIndexEventListener for the following six
Hibernate events and also add it after the default DefaultFlushEventListener,
as in the following example.
Hibernate 3.1.1.GA 33
Chapter 3. Configuration
<hibernate-configuration>
<session-factory>
...
<event type="post-update">
<listener
class="org.hibernate.search.event.FullTextIndexEventListener"/>
</event>
<event type="post-insert">
<listener
class="org.hibernate.search.event.FullTextIndexEventListener"/>
</event>
<event type="post-delete">
<listener
class="org.hibernate.search.event.FullTextIndexEventListener"/>
</event>
<event type="post-collection-recreate">
<listener
class="org.hibernate.search.event.FullTextIndexEventListener"/>
</event>
<event type="post-collection-remove">
<listener
class="org.hibernate.search.event.FullTextIndexEventListener"/>
</event>
<event type="post-collection-update">
<listener
class="org.hibernate.search.event.FullTextIndexEventListener"/>
</event>
<event type="flush">
<listener
class="org.hibernate.event.def.DefaultFlushEventListener"/>
<listener
class="org.hibernate.search.event.FullTextIndexEventListener"/>
</event>
</session-factory>
</hibernate-configuration>
hibernate.search.indexing_strategy manual
34 Hibernate 3.1.1.GA
Tuning Lucene indexing performance
Note
In most case, the JMS backend provides the best of both world, a
lightweight event based system keeps track of all changes in the
system, and the heavyweight indexing process is done by a separate
process or machine.
There are two sets of parameters allowing for different performance settings
depending on the use case. During indexing operations triggered by
database modifications, the parameters are grouped by the transaction
keyword:
hibernate.search.[default|<indexname>].indexwriter.transaction.<parameter_name>
hibernate.search.[default|<indexname>].indexwriter.batch.<parameter_name>
Unless the corresponding .batch property is explicitly set, the value will
default to the .transaction property. If no value is set for a .batch value in a
specific shard configuration, Hibernate Search will look at the index section,
then at the default section and after that it will look for a .transaction in the
same order:
hibernate.search.Animals.2.indexwriter.transaction.max_merge_docs 10
hibernate.search.Animals.2.indexwriter.transaction.merge_factor 20
hibernate.search.default.indexwriter.batch.max_merge_docs 100
This configuration will result in these settings applied to the second shard of
Animals index:
• transaction.max_merge_docs = 10
• batch.max_merge_docs = 100
• transaction.merge_factor = 20
• batch.merge_factor = 20
Hibernate 3.1.1.GA 35
Chapter 3. Configuration
The default for all values is to leave them at Lucene's own default, so
the listed values in the following table actually depend on the version of
Lucene you are using; values shown are relative to version 2.4. For more
information about Lucene indexing performances, please refer to the Lucene
documentation.
36 Hibernate 3.1.1.GA
to document buffers.
When used together
max_buffered_docs
a flush occursTuning
for Lucene indexing performance
whichever event
happens first.
Table 3.3. List of indexing performance and behavior
Generally for faster
properties indexing performance
it's best to flush by
RAM usage instead of
document count and use
as large a RAM buffer
as you can.
Expert: Set the interval 128
hibernate.search.[default|<indexname>].indexwriter.[transaction|batch].term_index_interv
between indexed terms.
Hibernate 3.1.1.GA 37
38 Hibernate 3.1.1.GA
Chapter 4. Mapping entities to the
index structure
All the metadata information needed to index entities is described through
annotations. There is no need for xml mapping files. In fact there is
currently no xml configuration option available (see HSEARCH-210
[http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-210]).
You can still use hibernate mapping files for the basic Hibernate
configuration, but the Search specific configuration has to be expressed via
annotations.
@Entity
@Indexed(index="indexes/essays")
public class Essay {
...
}
The index attribute tells Hibernate what the Lucene directory name
is (usually a directory on your file system). It is recommended
to define a base directory for all Lucene indexes using the
hibernate.search.default.indexBase property in your configuration file.
Alternatively you can specify a base directory per indexed entity by specifying
hibernate.search.<index>.indexBase, where <index> is the fully qualified
classname of the indexed entity. Each entity instance will be represented by
a Lucene Document inside the given index (aka Directory).
For each property (or attribute) of your entity, you have the ability to describe
how it will be indexed. The default (no annotation present) means that the
property is completely ignored by the indexing process. @Field does declare
a property as indexed. When indexing an element to a Lucene document you
can specify how it is indexed:
Hibernate 3.1.1.GA 39
Chapter 4. Mapping entities to the i...
• name : describe under which name, the property should be stored in the
Lucene Document. The default value is the property name (following the
JavaBeans convention)
• store : describe whether or not the property is stored in the Lucene index.
You can store the value Store.YES (consuming more space in the index but
allowing projection, see Section 5.1.2.5, “Projection” for more information),
store it in a compressed way Store.COMPRESS (this does consume more
CPU), or avoid any storage Store.NO (this is the default value). When
a property is stored, you can retrieve its original value from the Lucene
Document. This is not related to whether the element is indexed or not.
• index: describe how the element is indexed and the type of information
store. The different values are Index.NO (no indexing, ie cannot be found
by a query), Index.TOKENIZED (use an analyzer to process the property),
Index.UN_TOKENIZED (no analyzer pre-processing), Index.NO_NORMS (do not
store the normalization data). The default value is TOKENIZED.
Value Definition
TermVector.YES Store the term vectors of each
document. This produces two
synchronized arrays, one contains
document terms and the other
contains the term's frequency.
TermVector.NO Do not store term vectors.
TermVector.WITH_OFFSETS Store the term vector and token
offset information. This is the same
as TermVector.YES plus it contains
the starting and ending offset
position information for the terms.
TermVector.WITH_POSITIONS Store the term vector and token
position information. This is the
same as TermVector.YES plus it
contains the ordinal positions of
each occurrence of a term in a
document.
TermVector.WITH_POSITION_OFFSETS
40 Hibernate 3.1.1.GA
Basic mapping
Value Definition
Store the term vector, token
position and offset information.
This is a combination of the
YES, WITH_OFFSETS and
WITH_POSITIONS.
Whether or not you want to store the original data in the index depends on
how you wish to use the index query result. For a regular Hibernate Search
usage storing is not necessary. However you might want to store some fields
to subsequently project them (see Section 5.1.2.5, “Projection” for more
information).
@Entity
@Indexed(index="indexes/essays")
public class Essay {
...
@Id
@DocumentId
public Long getId() { return id; }
@Lob
@Field(index=Index.TOKENIZED)
public String getText() { return text; }
}
Hibernate 3.1.1.GA 41
Chapter 4. Mapping entities to the i...
The above annotations define an index with three fields: id , Abstract and
text . Note that by default the field name is decapitalized, following the
JavaBean specification
@Entity
@Indexed(index = "Book" )
public class Book {
@Fields( {
@Field(index = Index.TOKENIZED),
@Field(name = "summary_forSort", index =
Index.UN_TOKENIZED, store = Store.YES)
} )
public String getSummary() {
return summary;
}
...
}
The field summary is indexed twice, once as summary in a tokenized way, and
once as summary_forSort in an untokenized way. @Field supports 2 attributes
useful when @Fields is used:
• analyzer: defines a @Analyzer annotation per field rather than per property
See below for more information about analyzers and field bridges.
42 Hibernate 3.1.1.GA
Embedded and associated objects
@Entity
@Indexed
public class Place {
@Id
@GeneratedValue
@DocumentId
private Long id;
@Entity
public class Address {
@Id
@GeneratedValue
private Long id;
@Field(index=Index.TOKENIZED)
private String street;
@Field(index=Index.TOKENIZED)
private String city;
@ContainedIn
@OneToMany(mappedBy="address")
private Set<Place> places;
...
}
In this example, the place fields will be indexed in the Place index. The Place
index documents will also contain the fields address.id, address.street,
and address.city which you will be able to query. This is enabled by the
@IndexedEmbedded annotation.
Hibernate 3.1.1.GA 43
Chapter 4. Mapping entities to the i...
44 Hibernate 3.1.1.GA
Embedded and associated objects
@Entity
@Indexed
public class Place {
@Id
@GeneratedValue
@DocumentId
private Long id;
@Entity
public class Address {
@Id
@GeneratedValue
private Long id;
@Field(index=Index.TOKENIZED)
private String street;
@Field(index=Index.TOKENIZED)
private String city;
@ContainedIn
@OneToMany(mappedBy="address")
private Set<Place> places;
...
}
@Embeddable
public class Owner {
@Field(index = Index.TOKENIZED)
private String name;
...
}
Hibernate 3.1.1.GA 45
Chapter 4. Mapping entities to the i...
to the main entity index. In the previous example, the index will contain the
following fields
• id
• name
• address.street
• address.city
• address.ownedBy_name
Note
The prefix cannot be set to the empty string.
The depth property is necessary when the object graph contains a cyclic
dependency of classes (not instances). For example, if Owner points to Place.
Hibernate Search will stop including Indexed embedded attributes after
reaching the expected depth (or the object graph boundaries are reached).
A class having a self reference is an example of cyclic dependency. In our
example, because depth is set to 1, any @IndexedEmbedded attribute in Owner
(if any) will be ignored.
• Return places where name contains JBoss and where address city is
Atlanta. In Lucene query this would be
+name:jboss +address.city:atlanta
• Return places where name contains JBoss and where owner's name
contain Joe. In Lucene query this would be
+name:jboss +address.orderBy_name:joe
In a way it mimics the relational join operation in a more efficient way (at the
cost of data duplication). Remember that, out of the box, Lucene indexes
have no notion of association, the join operation is simply non-existent. It
might help to keep the relational model normalized while benefiting from the
full text index speed and feature richness.
46 Hibernate 3.1.1.GA
Boost factor
Note
An associated object can itself (but does not have to) be @Indexed
@Entity
@Indexed
public class Address {
@Id
@GeneratedValue
@DocumentId
private Long id;
@Field(index= Index.TOKENIZED)
private String street;
...
}
@Embeddable
public class Owner implements Person { ... }
Hibernate 3.1.1.GA 47
Chapter 4. Mapping entities to the i...
@Entity
@Indexed(index="indexes/essays")
@Boost(1.7f)
public class Essay {
...
@Id
@DocumentId
public Long getId() { return id; }
@Lob
@Field(index=Index.TOKENIZED, boost=@Boost(1.2f))
public String getText() { return text; }
@Field
public String getISBN() { return isbn; }
In our example, Essay's probability to reach the top of the search list will be
multiplied by 1.7. The summary field will be 3.0 (2 * 1.5 - @Field.boost and
@Boost on a property are cumulative) more important than the isbn field.
The text field will be 1.2 times more important than the isbn field. Note
that this explanation in strictest terms is actually wrong, but it is simple and
close enough to reality for all practical purposes. Please check the Lucene
documentation or the excellent Lucene In Action from Otis Gospodnetic and
Erik Hatcher.
4.1.5. Analyzer
The default analyzer class used to index tokenized fields is configurable
through the hibernate.search.analyzer property. The default value for this
property is org.apache.lucene.analysis.standard.StandardAnalyzer.
You can also define the analyzer class per entity, property and even per
@Field (useful when multiple fields are indexed from a single property).
48 Hibernate 3.1.1.GA
Analyzer
@Entity
@Indexed
@Analyzer(impl = EntityAnalyzer.class)
public class MyEntity {
@Id
@GeneratedValue
@DocumentId
private Integer id;
@Field(index = Index.TOKENIZED)
private String name;
@Field(index = Index.TOKENIZED)
@Analyzer(impl = PropertyAnalyzer.class)
private String summary;
...
}
Caution
Mixing different analyzers in the same entity is most of the time a
bad practice. It makes query building more complex and results less
predictable (for the novice), especially if you are using a QueryParser
(which uses the same analyzer for the whole query). As a rule of
thumb, for any given field the same analyzer should be used for
indexing and querying.
Hibernate 3.1.1.GA 49
Chapter 4. Mapping entities to the i...
@AnalyzerDef(name="customanalyzer",
tokenizer = @TokenizerDef(factory =
StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory =
ISOLatin1AccentFilterFactory.class),
@TokenFilterDef(factory =
LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class,
params = {
@Parameter(name="words", value=
"org/hibernate/search/test/analyzer/solr/stoplist.properties" ),
@Parameter(name="ignoreCase", value="true")
})
})
public class Team {
...
}
50 Hibernate 3.1.1.GA
Analyzer
Warning
Filters are applied in the order they are defined in the @AnalyzerDef
annotation. Make sure to think twice about this order.
@Entity
@Indexed
@AnalyzerDef(name="customanalyzer", ... )
public class Team {
@Id
@DocumentId
@GeneratedValue
private Integer id;
@Field
private String name;
@Field
private String location;
Analyzer analyzer =
fullTextSession.getSearchFactory().getAnalyzer("customanalyzer");
Solr and Lucene come with a lot of useful default tokenizers and filters.
You can find a complete list of tokenizer factories and filter factories at
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters. Let check a few
of them.
Hibernate 3.1.1.GA 51
Chapter 4. Mapping entities to the i...
SnowballPorterFilterFactory
Reduces a word to it's language:Danish,
root in a given language. Dutch, English, Finnish,
(eg. protect, protects, French, German, Italian,
protection share the Norwegian, Portuguese,
same root). Using such Russian, Spanish,
a filter allows searches Swedish
matching related words. and a few more
ISOLatin1AccentFilterFactory
remove accents for none
languages like French
52 Hibernate 3.1.1.GA
Analyzer
Hibernate 3.1.1.GA 53
Chapter 4. Mapping entities to the i...
@Entity
@Indexed
@AnalyzerDefs({
@AnalyzerDef(name = "en",
tokenizer = @TokenizerDef(factory =
StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = EnglishPorterFilterFactory.class
)
}),
@AnalyzerDef(name = "de",
tokenizer = @TokenizerDef(factory =
StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = GermanStemFilterFactory.class)
})
})
public class BlogEntry {
@Id
@GeneratedValue
@DocumentId
private Integer id;
@Field
@AnalyzerDiscriminator(impl = LanguageDiscriminator.class)
private String language;
@Field
private String text;
// standard getter/setter
...
}
54 Hibernate 3.1.1.GA
Analyzer
Note
During indexing time, Hibernate Search is using analyzers under the hood
for you. In some situations, retrieving analyzers can be handy. If your domain
model makes use of multiple analyzers (maybe to benefit from stemming, use
phonetic approximation and so on), you need to make sure to use the same
analyzers when you build your query.
Note
This rule can be broken but you need a good reason for it. If you are
unsure, use the same analyzers.
You can retrieve the scoped analyzer for a given entity used at indexing time
by Hibernate Search. A scoped analyzer is an analyzer which applies the
right analyzers depending on the field indexed: multiple analyzers can be
defined on a given entity each one working on an individual field, a scoped
analyzer unify all these analyzers into a context-aware analyzer. While the
theory seems a bit complex, using the right analyzer in a query is very easy.
Hibernate 3.1.1.GA 55
Chapter 4. Mapping entities to the i...
org.apache.lucene.search.Query luceneQuery =
parser.parse( "title:sky Or title_stemmed:diamond" );
org.hibernate.Query fullTextQuery =
fullTextSession.createFullTextQuery( luceneQuery, Song.class );
In the example above, the song title is indexed in two fields: the standard
analyzer is used in the field title and a stemming analyzer is used in the
field title_stemmed. By using the analyzer provided by the search factory, the
query uses the appropriate analyzer depending on the field targeted.
If your query targets more that one query and you wish to use
your standard analyzer, make sure to describe it using an analyzer
definition. You can retrieve analyzers by their definition name using
searchFactory.getAnalyzer(String).
null
null elements are not indexed. Lucene does not support null elements
and this does not make much sense either.
java.lang.String
String are indexed as is
56 Hibernate 3.1.1.GA
Built-in bridges
short, Short, integer, Integer, long, Long, float, Float, double, Double,
BigInteger, BigDecimal
Numbers are converted in their String representation. Note that numbers
cannot be compared by Lucene (ie used in ranged queries) out of the
box: they have to be padded
Note
java.util.Date
Dates are stored as yyyyMMddHHmmssSSS in GMT time
(200611072203012 for Nov 7th of 2006 4:03PM and 12ms EST). You
shouldn't really bother with the internal format. What is important is that
when using a DateRange Query, you should know that the dates have to
be expressed in GMT time.
@Entity
@Indexed
public class Meeting {
@Field(index=Index.UN_TOKENIZED)
@DateBridge(resolution=Resolution.MINUTE)
private Date date;
...
Warning
java.net.URI, java.net.URL
URI and URL are converted to their string representation
java.lang.Class
Class are converted to their fully qualified class name. The thread context
classloader is used when the class is rehydrated
Hibernate 3.1.1.GA 57
Chapter 4. Mapping entities to the i...
4.2.2.1. StringBridge
/**
* Padding Integer bridge.
* All numbers will be padded with 0 to match 5 digits
*
* @author Emmanuel Bernard
*/
public class PaddedIntegerBridge implements StringBridge {
Then any property or field can use this bridge thanks to the @FieldBridge
annotation
@FieldBridge(impl = PaddedIntegerBridge.class)
private Integer length;
58 Hibernate 3.1.1.GA
Custom Bridge
//property
@FieldBridge(impl = PaddedIntegerBridge.class,
params = @Parameter(name="padding", value="10")
)
private Integer length;
Hibernate 3.1.1.GA 59
Chapter 4. Mapping entities to the i...
//id property
@DocumentId
@FieldBridge(impl = PaddedIntegerBridge.class,
params = @Parameter(name="padding", value="10")
private Integer id;
4.2.2.2. FieldBridge
Some use cases require more than a simple object to string translation when
mapping a property to a Lucene index. To give you the greatest possible
flexibility you can also implement a bridge as a FieldBridge. This interface
gives you a property value and let you map it the way you want in your
Lucene Document.The interface is very similar in its concept to the Hibernate
UserTypes.
60 Hibernate 3.1.1.GA
Custom Bridge
You can for example store a given property in two different document fields:
Hibernate 3.1.1.GA 61
Chapter 4. Mapping entities to the i...
/**
* Store the date in 3 different fields - year, month, day - to ease
Range Query per
* year, month or day (eg get all the elements of December for the
last 5 years).
*
* @author Emmanuel Bernard
*/
public class DateSplitBridge implements FieldBridge {
private final static TimeZone GMT = TimeZone.getTimeZone("GMT");
// set year
Field field = new Field(name + ".year",
String.valueOf(year),
luceneOptions.getStore(), luceneOptions.getIndex(),
luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
document.add(field);
//property
@FieldBridge(impl = DateSplitBridge.class)
private Date date;
62 Hibernate 3.1.1.GA
Custom Bridge
4.2.2.3. ClassBridge
Hibernate 3.1.1.GA 63
Chapter 4. Mapping entities to the i...
@Entity
@Indexed
@ClassBridge(name="branchnetwork",
index=Index.TOKENIZED,
store=Store.YES,
impl = CatFieldsClassBridge.class,
params = @Parameter( name="sepChar", value=" " ) )
public class Department {
private int id;
private String network;
private String branchHead;
private String branch;
private Integer maxEmployees
...
}
64 Hibernate 3.1.1.GA
Providing your own id
You can provide your own id for Hibernate Search if you are extending the
internals. You will have to generate a unique value so it can be given to
Lucene to be indexed. This will have to be given to Hibernate Search when
you create an org.hibernate.search.Work object - the document id is required
in the constructor.
Hibernate 3.1.1.GA 65
66 Hibernate 3.1.1.GA
Chapter 5. Querying
The second most important capability of Hibernate Search is the ability to
execute a Lucene query and retrieve entities managed by an Hibernate
session, providing the power of Lucene without leaving the Hibernate
paradigm, and giving another dimension to the Hibernate classic search
mechanisms (HQL, Criteria query, native SQL query). Preparing and
executing a query consists of four simple steps:
• Creating a FullTextSession
The actual search facility is built on native Lucene queries which the following
example illustrates.
org.apache.lucene.queryParser.QueryParser parser =
new QueryParser("title", new StopAnalyzer() );
Hibernate 3.1.1.GA 67
Chapter 5. Querying
In case you are using the Java Persistence APIs of Hibernate (aka EJB 3.0
Persistence), the same extensions exist:
EntityManager em = entityManagerFactory.createEntityManager();
FullTextEntityManager fullTextEntityManager =
org.hibernate.hibernate.search.jpa.Search.getFullTextEntityManager(em);
...
org.apache.lucene.queryParser.QueryParser parser =
new QueryParser("title", new StopAnalyzer() );
The following examples we will use the Hibernate APIs but the same
example can be easily rewritten with the Java Persistence API by just
adjusting the way the FullTextQuery is retrieved.
5.1.2.1. Generality
68 Hibernate 3.1.1.GA
Building a Hibernate Search query
If not specified otherwise, the query will be executed against all indexed
entities, potentially returning all types of indexed classes. It is advised, from a
performance point of view, to restrict the returned types:
org.hibernate.Query fullTextQuery =
fullTextSession.createFullTextQuery( luceneQuery, Customer.class );
// or
fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery,
Item.class, Actor.class );
The first example returns only matching Customers, the second returns
matching Actors and Items. The type restriction is fully polymorphic which
means that if there are two indexed subclasses Salesman and Customer of the
baseclass Person, it is possible to just specify Person.class in order to filter
on result types.
5.1.2.2. Pagination
org.hibernate.Query fullTextQuery =
fullTextSession.createFullTextQuery( luceneQuery, Customer.class );
fullTextQuery.setFirstResult(15); //start from the 15th element
fullTextQuery.setMaxResults(10); //return 10 elements
Note
Hibernate 3.1.1.GA 69
Chapter 5. Querying
5.1.2.3. Sorting
Apache Lucene provides a very flexible and powerful way to sort results.
While the default sorting (by relevance) is appropriate most of the time, it can
be interesting to sort by one or several other properties. In order to do so set
the Lucene Sort object to apply a Lucene sorting strategy.
It is often useful, however, to refine the fetching strategy for a specific use
case.
In this example, the query will return all Books matching the luceneQuery.
The authors collection will be loaded from the same query using an SQL
outer join.
When defining a criteria query, it is not needed to restrict the entity types
returned while creating the Hibernate Search query from the full text session:
the type is guessed from the criteria query itself. Only fetch mode can be
adjusted, refrain from applying any other restriction.
One cannot use setCriteriaQuery if more than one entity type is expected to
be returned.
70 Hibernate 3.1.1.GA
Building a Hibernate Search query
5.1.2.5. Projection
For some use cases, returning the domain object (graph) is overkill. Only a
small subset of the properties is necessary. Hibernate Search allows you to
return a subset of properties:
Hibernate Search extracts the properties from the Lucene index and
convert them back to their object representation, returning a list of Object[].
Projections avoid a potential database round trip (useful if the query
response time is critical), but has some constraints:
• you can only project simple properties of the indexed entity or its
embedded associations. This means you cannot project a whole
embedded entity.
• projection does not work on collections or maps which are indexed via
@IndexedEmbedded
Projection is useful for another kind of use cases. Lucene provides some
metadata information to the user about the results. By using some special
placeholders, the projection mechanism can retrieve them:
Hibernate 3.1.1.GA 71
Chapter 5. Querying
You can mix and match regular fields and special placeholders. Here is the
list of available placeholders:
72 Hibernate 3.1.1.GA
Performance considerations
Hibernate 3.1.1.GA 73
Chapter 5. Querying
Note
5.2.3. ResultTransformer
Especially when using projection, the data structure returned by a query (an
object array in this case), is not always matching the application needs. It
is possible to apply a ResultTransformer operation post query to match the
targeted data structure:
query.setResultTransformer(
new StaticAliasToBeanResultTransformer( BookView.class, "title",
"author" )
);
List<BookView> results = (List<BookView>) query.list();
for(BookView view : results) {
log.info( "Book: " + view.getTitle() + ", " + view.getAuthor()
);
}
• Use projection
74 Hibernate 3.1.1.GA
Filters
Warning
The Document id has nothing to do with the entity id. Do not mess up
these two notions.
The second approach let's you project the Explanation object using the
FullTextQuery.EXPLANATION constant.
5.3. Filters
Apache Lucene has a powerful feature that allows to filter query results
according to a custom filtering process. This is a very powerful way to apply
additional data restrictions, especially since filters can be cached and reused.
Some interesting use cases are:
• security
Hibernate 3.1.1.GA 75
Chapter 5. Querying
In this example we enabled two filters on top of the query. You can enable (or
disable) as many filters as you like.
@Entity
@Indexed
@FullTextFilterDefs( {
@FullTextFilterDef(name = "bestDriver", impl =
BestDriversFilter.class),
@FullTextFilterDef(name = "security", impl =
SecurityFilterFactory.class)
})
public class Driver { ... }
76 Hibernate 3.1.1.GA
Filters
If your Filter creation requires additional steps or if the filter you want to use
does not have a no-arg constructor, you can use the factory pattern:
@Entity
@Indexed
@FullTextFilterDef(name = "bestDriver", impl =
BestDriversFilterFactory.class)
public class Driver { ... }
@Factory
public Filter getFilter() {
//some additional steps to cache the filter results per
IndexReader
Filter bestDriversFilter = new BestDriversFilter();
return new CachingWrapperFilter(bestDriversFilter);
}
}
Hibernate Search will look for a @Factory annotated method and use it to
build the filter instance. The factory must have a no-arg constructor. For
people familiar with JBoss Seam, this is similar to the component factory
pattern, but the annotation is different!
Each parameter name should have an associated setter on either the filter or
filter factory of the targeted named filter definition.
Hibernate 3.1.1.GA 77
Chapter 5. Querying
/**
* injected parameter
*/
public void setLevel(Integer level) {
this.level = level;
}
@Key
public FilterKey getKey() {
StandardFilterKey key = new StandardFilterKey();
key.addParameter( level );
return key;
}
@Factory
public Filter getFilter() {
Query query = new TermQuery( new Term("level",
level.toString() ) );
return new CachingWrapperFilter( new
QueryWrapperFilter(query) );
}
}
Note the method annotated @Key returning a FilterKey object. The returned
object has a special contract: the key object must implement equals() /
hashCode() so that 2 keys are equal if and only if the given Filter types are
the same and the set of parameters are the same. In other words, 2 filter
keys are equal if and only if the filters from which the keys are generated can
be interchanged. The key object is used as a key in the cache mechanism.
As mentioned before the defined filters are per default cached and the
cache uses a combination of hard and soft references to allow disposal of
memory when needed. The hard reference cache keeps track of the most
recently used filters and transforms the ones least used to SoftReferences
78 Hibernate 3.1.1.GA
Filters
when needed. Once the limit of the hard reference cache is reached
additional filters are cached as SoftReferences. To adjust the size of the
hard reference cache, use hibernate.search.filter.cache_strategy.size
(defaults to 128). For advanced use of filter caching, you can implement
your own FilterCachingStrategy. The classname is defined by
hibernate.search.filter.cache_strategy.
This filter caching mechanism should not be confused with caching the
actual filter results. In Lucene it is common practice to wrap filters using the
IndexReader around a CachingWrapperFilter. The wrapper will cache the
DocIdSet returned from the getDocIdSet(IndexReader reader) method to
avoid expensive recomputation. It is important to mention that the computed
DocIdSet is only cachable for the same IndexReader instance, because the
reader effectively represents the state of the index at the moment it was
opened. The document list cannot change within an opened IndexReader. A
different/new IndexReader instance, however, works potentially on a different
set of Documents (either from a different index or simply because the index
has changed), hence the cached DocIdSet has to be recomputed.
Value Definition
FilterCacheModeType.NONE No filter instance and no result is
cached by Hibernate Search. For
every filter call, a new filter instance
is created. This setting might be
useful for rapidly changing data
sets or heavily memory constrained
environments.
FilterCacheModeType.INSTANCE_ONLY
The filter instance is cached
and reused across concurrent
Filter.getDocIdSet() calls. DocIdSet
results are not cached. This setting
Hibernate 3.1.1.GA 79
Chapter 5. Querying
Value Definition
is useful when a filter uses its own
specific caching mechanism or the
filter results change dynamically
due to application specific events
making DocIdSet caching in both
cases unnecessary.
FilterCacheModeType.INSTANCE_AND_DOCIDSETRESULTS
Both the filter instance and the
DocIdSet results are cached. This is
the default value.
Last but not least - why should filters be cached? There are two areas where
filter caching shines:
• the system does not update the targeted entity index often (in other words,
the IndexReader is reused a lot)
• the way Hibernate Search interacts with the Lucene readers: defines the
appropriate Reader strategy.
80 Hibernate 3.1.1.GA
Chapter 6. Manual indexing
6.1. Indexing
It is sometimes useful to index an entity even if this entity is not inserted or
updated to the database. This is for example the case when you want to build
your index for the first time. FullTextSession.index() allows you to do so.
FullTextSession fullTextSession =
Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
fullTextSession.index(customer);
}
tx.commit(); //index are written at commit time
Note
Other parameters which also can affect indexing time and memory
consumption are:
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_buffered_do
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_field_lengt
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_merge_docs
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].merge_factor
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].ram_buffer_size
Hibernate 3.1.1.GA 81
Chapter 6. Manual indexing
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].term_index_inte
These parameters are Lucene specific and Hibernate Search is just passing
these parameters through - see Section 3.8, “Tuning Lucene indexing
performance” for more details.
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
transaction = fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
ScrollableResults results = fullTextSession.createCriteria(
Email.class )
.setFetchSize(BATCH_SIZE)
.scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
index++;
fullTextSession.index( results.get(0) ); //index each element
if (index % BATCH_SIZE == 0) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //clear since the queue is
processed
}
}
transaction.commit();
Try to use a batch size that guarantees that your application will not run out
of memory.
6.2. Purging
It is equally possible to remove an entity or all entities of a given type
from a Lucene index without the need to physically remove them from the
database. This operation is named purging and is also done through the
FullTextSession.
FullTextSession fullTextSession =
Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
fullTextSession.purge( Customer.class, customer.getId() );
}
tx.commit(); //index are written at commit time
82 Hibernate 3.1.1.GA
Purging
Purging will remove the entity with the given id from the Lucene index but will
not touch the database.
If you need to remove all entities of a given type, you can use the purgeAll
method. This operation remove all entities of the type passed as a parameter
as well as all its subtypes.
FullTextSession fullTextSession =
Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
fullTextSession.purgeAll( Customer.class );
//optionally optimize the index
//fullTextSession.getSearchFactory().optimize( Customer.class );
tx.commit(); //index are written at commit time
Note
Methods index, purge and purgeAll are available on
FullTextEntityManager as well.
Hibernate 3.1.1.GA 83
84 Hibernate 3.1.1.GA
Chapter 7. Index Optimization
From time to time, the Lucene index needs to be optimized. The process is
essentially a defragmentation. Until an optimization is triggered Lucene only
marks deleted documents as such, no physical deletions are applied. During
the optimization process the deletions will be applied which also effects the
number of files in the Lucene Directory.
Optimizing the Lucene index speeds up searches but has no effect on the
indexation (update) performance. During an optimization, searches can be
performed, but will most likely be slowed down. All index updates will be
stopped. It is recommended to schedule optimization:
hibernate.search.default.optimizer.operation_limit.max = 1000
hibernate.search.default.optimizer.transaction_limit.max = 100
hibernate.search.Animal.optimizer.transaction_limit.max = 50
Hibernate 3.1.1.GA 85
Chapter 7. Index Optimization
FullTextSession fullTextSession =
Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();
searchFactory.optimize(Order.class);
// or
searchFactory.optimize();
The first example optimizes the Lucene index holding Orders; the second,
optimizes all indexes.
Note
searchFactory.optimize() has no effect on a JMS backend. You must
apply the optimize operation on the Master node.
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_buffered_do
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_field_lengt
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_merge_docs
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].merge_factor
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].ram_buffer_size
• hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].term_index_inte
See Section 3.8, “Tuning Lucene indexing performance” for more details.
86 Hibernate 3.1.1.GA
Chapter 8. Advanced features
8.1. SearchFactory
The SearchFactory object keeps track of the underlying Lucene resources for
Hibernate Search, it's also a convenient way to access Lucene natively. The
SearchFactory can be accessed from a FullTextSession:
FullTextSession fullTextSession =
Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();
DirectoryProvider[] provider =
searchFactory.getDirectoryProviders(Order.class);
org.apache.lucene.store.Directory directory =
provider[0].getDirectory();
Hibernate 3.1.1.GA 87
Chapter 8. Advanced features
DirectoryProvider orderProvider =
searchFactory.getDirectoryProviders(Order.class)[0];
DirectoryProvider clientProvider =
searchFactory.getDirectoryProviders(Client.class)[0];
try {
//do read-only operations on the reader
}
finally {
readerProvider.closeReader(reader);
}
• Don't use this IndexReader for modification operations (you would get an
exception). If you want to use a read/write index reader, open one from the
Lucene Directory object.
Aside from those rules, you can use the IndexReader freely, especially to do
native queries. Using the shared IndexReaders will make most queries more
efficient.
2
score(q,d) = coord(q,d) · queryNorm(q) · #t in q ( tf(t in d) · idf(t) ·
t.getBoost() · norm(t,d) )
Factor Description
tf(t ind) Term frequency factor for the term (t)
in the document (d).
88 Hibernate 3.1.1.GA
Customizing Lucene's scoring formula
Factor Description
idf(t) Inverse document frequency of the
term.
coord(q,d) Score factor based on how many
of the query terms are found in the
specified document.
queryNorm(q) Normalizing factor used to make
scores between queries comparable.
t.getBoost() Field boost.
norm(t,d) Encapsulates a few (indexing time)
boost and length factors.
It is beyond the scope of this manual to explain this formula in more detail.
Please refer to Similarity's Javadocs for more information.
@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}
Hibernate 3.1.1.GA 89
90 Hibernate 3.1.1.GA