Wednesday, 20 June 2012

Implementing Spellchecker in Solr

A common need in search applications is suggesting correct word/ phrase for a misspelled word / phrase. These suggestions may come from a dictionary that is based upon some field or upon any other arbitrary dictionary. 

Implementing a spell-check suggester in Apace Solr is a piece of cake. All you need is to follow the given steps and you are done.

Step 1:
Open <SOLR_HOME>/example/solr/conf/solrconfig.xml and search for 
<lst name="spellchecker"> 
under this tag make changes to 
<str name="field">name</str> 
in place of  name define your field which will be referred for spell checking

Now we have to update the spell requestHandler settings, to do so search for 
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
and under this tag make changes to
<str name="df">text</str> 
in place of   text  define your field which will be referred for spell checking, also increase the  <str name="spellcheck.count">1</str> to the desired suggestion count.

Step 2: 
Your configuration part is over now so (re)start your solr server to reflect the changes done.

Step 3:
Now you'll need to instruct the spellcheck component to build its dictionary. This can be done by issuing an empty query with the parameter set to true, as with the following URL: 

NOTE: is needed only once to build the spellcheck index and should not be specified with each request

Step 4:
Now your dictionary is built and ready to give suggestions for misspelled words. Consider this misspelled request 

http://<host>:<port>/solr/spell?q=blogr wrld&spellcheck=true&spellcheck.collate=true

The above request returns following suggestions as response


<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">13</int>
<result name="response" numFound="0" start="0"/>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="blgge">
<int name="numFound">3</int>
<int name="startOffset">0</int>
<int name="endOffset">5</int>
<arr name="suggestion">
<lst name="wrld">
<int name="numFound">3</int>
<int name="startOffset">6</int>
<int name="endOffset">10</int>
<arr name="suggestion">
<str name="collation">blogger world</str>


No comments:

Post a Comment