Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3529

Add search to HBase

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Later
    • 0.90.0
    • None
    • None
    • None

    Description

      Using the Apache Lucene library we can add freetext search to HBase. The advantages of this are:

      • HBase is highly scalable and distributed
      • HBase is realtime
      • Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
      • Lucene offers many types of queries not currently available in HBase (eg, AND, OR, NOT, phrase, etc)
      • It's easier to build scalable realtime systems on top of already architecturally sound, scalable realtime data system, eg, HBase.
      • Scaling realtime search will be as simple as scaling HBase.

      Phase 1 - Indexing:

      • Integrate Lucene into HBase such that an index mirrors a given region. This means cascading add, update, and deletes between a Lucene index and an HBase region (and vice versa).
      • Define meta-data to mark a region as indexed, and use a Solr schema to allow the user to define the fields and analyzers.
      • Integrate with the HLog to ensure that index recovery can occur properly (eg, on region server failure)
      • Mirror region splits with indexes (use Lucene's IndexSplitter?)
      • When a region is written to HDFS, also write the corresponding Lucene index to HDFS.
      • A row key will be the ID of a given Lucene document. The Lucene docstore will explicitly not be used because the document/row data is stored in HBase. We will need to solve what the best data structure for efficiently mapping a docid -> row key is. It could be a docstore, field cache, column stride fields, or some other mechanism.
      • Write unit tests for the above

      Phase 2 - Queries:

      • Enable distributed Lucene queries
      • Regions that have Lucene indexes are inherently available and may be searched on, meaning there's no need for a separate search related system in Zookeeper.
      • Integrate search with HBase's RPC mechanis

      Attachments

        1. HBASE-3529.patch
          41 kB
          Jason Rutherglen
        2. HDFS-APPEND-0.20-LOCAL-FILE.patch
          8 kB
          Jason Rutherglen

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jasonrutherglen Jason Rutherglen
              Votes:
              37 Vote for this issue
              Watchers:
              87 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: