Starting with Zend_Search_Lucene

As websites grows, searches like “LIKE title ‘%search term%’” becomes unreliable. There are very good solutions like Sphinx, Lucene, etc, but not surprisingly, you can’t always have Sphinx installed (shared servers again) and other solutions should be chosen.

MySQL supports full-text indexing, but it doesn’t give a lot of control over actual index. Luckily, Zend team has done wonderful job and implemented Lucene search in PHP (100%). Zend_Search_Lucene is part of Zend Framework, but as all framework modules runs almost independently (it uses Zend_Exception, etc.).

How you start indexing data? Zend manual has very good examples how to start with Lucene, but to create sample index index you can use this code (you need to have auto-loading enabled and db connection available):

// Create index
$index = Zend_Search_Lucene::create('indexes/products');

$sql = "select product_name, product_url from products";

$results = $db->fetchAll($sql);

foreach ($results as $result)
    $doc = new Zend_Search_Lucene_Document();

    // Store document URL to identify it in the search results
    Zend_Search_Lucene_Field::UnIndexed('url', $result->product_url));

    // Index document title
    Zend_Search_Lucene_Field::Text('title', $result->product_name));

    // Add document to the index

// Optimize index.

This simple code will select products information from database, loop through results and add them as documents to index. In this example I added url as UnIndexed, because I’m only going to search by title, but Lucene allows other field types. In most cases, product description or document text should be added (or maybe even indexed).

Searching through index is even easier. One thing you need to learn is how to construct search queries in required query language. Example:

// Open index
$index = Zend_Search_Lucene::open('indexes/products');

$query = 'title:"Apple MacBook"';

// Search by query
$hits = $index->find($query);

foreach ($hits as $hit) {
    echo $hit->score . " ";
    echo $hit->title . " ";
    echo $hit->url . PHP_EOL;

I tried creating index of 6’000 products, index (0.7 MB) was created in around 3 minutes and all searches takes about 0.1 s. I tested it on my laptop, without APC and with development Apache/PHP configuration. Normal servers would run this task much more faster, but 0.1 for search is not that bad.

Zend_Search_Lucene will not change Sphinx or Lucene, but in limited environments (like shared servers) it can be quite useful. It supports many query types: phrase queries, boolean queries, wildcard queries, proximity queries, range queries and many other, what can be hardly achieved with using full-text MySQL indexes.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s