Importing / Indexing MySQL data into Solr

Apache Solr is a fast, full featured, open-source Java search server. It enables you to easily create search engines which index, search & analyze data from websites, databases or files.

Before we proceed further I presume that you have already configured Solr server without data or refer to Solr in 5 minutes tutorial to configure your Solr server first.

Configure your Solr server


Step 1:

To index your data firstly you need to define schema by editing file called schema.xml present at <SOLR_HOME>/example/solr/conf/
Schema.xml in solr serves the similar purpose that table does in mysql
In schema.xml file you have to define the
  • Fields with there field type
  • Field which should be used as unique key
  • Which fields are required, indexed
  • Default search field (deprecated in new versions)
  • Default search operator (AND | OR deprecated in new versions)
**indexed fields are fields which undergo an analysis phase, and are added to the index.
Search for <fields> tag and define your fields under this

<fields>
    <field name="blog_id" type="string" indexed="true" stored="true" omitNorms="true"/>
    <field name="tags" type="text_general" indexed="true" stored="true"/>
    <field name="desc" type="text_general" indexed="true" stored="true"/>
    <field name="created" type="tlong" indexed="true" stored="true" omitNorms="true"/>
    <field name="uid" type="int" indexed="true" stored="true" omitNorms="true"/>
</fields>

Search for  <uniqueKey> tag and define your primary key name
<uniqueKey>blog_id</uniqueKey>

Step 2:

Now you need to define how this schema is mapped to your database. For this purpose you have to create a new file data-config.xml under <SOLR_HOME>/example/solr/conf directory.

<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
    <dataSource type="JdbcDataSource"
                  driver="com.mysql.jdbc.Driver"
                  url="jdbc:mysql://localhost/myDatabase"
                  user="dbuser"
                  password="p@ssword"/>
    <document name="doc">
        <entity name="blog"
                query="select b.blog_id, b.tags, concat(b.title,' ' ,b.desc)  as desc, b.created, b.uid, from blogs b order by b.blog_id;">
           
              <field column="blog_id" name="blog_id" />
              <field column="tags" name="tags" />
              <field column="desc" name="desc" />
              <field column="created" name="created"/>
              <field column="uid" name="uid"/>
        </entity>
    </document>
</dataConfig>

Step 3:

Edit file solrconfig.xml which is located at <SOLR_HOME>/example/solr/conf directory. Add the following requestHandler entry if not already existing.

<requestHandler name="/dataimport">
     <lst name="defaults">
          <str name="config">data-config.xml</str>
     </lst>
</requestHandler>

Step 4:


Download MySQL jdbc driver jar and copy it into <SOLR_HOME>/lib directory


Step 5:


(Re)Start your webserver

Performing full or delta indexing

If everything works correctly, you can get solr to fully index the configured tables by accessing the following command via your browser.  
http://<host>:<port>/solr/dataimport?command=full-import


You can check the status of the command by accessing
http://<host>:<port>/solr/dataimport


To do an incremental or delta indexing of data since the last full or delta, issue the command 
http://<host>:<port>/solr/dataimport?command=delta-import


Similarly any other database can be configured all you need is 
  • jdbc jar of that database
  • make changes to data-config.xml
     <dataSource type="JdbcDataSource" driver="xxxx" url="xxxx" user="xxxx"                password="xxxx"/>

Comments

Popular posts from this blog

The Externalizable Interface

Using Solr Spellchecker from Java