Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Later
    • Affects Version/s: 0.90.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Using the Apache Lucene library we can add freetext search to HBase. The advantages of this are:

      • HBase is highly scalable and distributed
      • HBase is realtime
      • Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
      • Lucene offers many types of queries not currently available in HBase (eg, AND, OR, NOT, phrase, etc)
      • It's easier to build scalable realtime systems on top of already architecturally sound, scalable realtime data system, eg, HBase.
      • Scaling realtime search will be as simple as scaling HBase.

      Phase 1 - Indexing:

      • Integrate Lucene into HBase such that an index mirrors a given region. This means cascading add, update, and deletes between a Lucene index and an HBase region (and vice versa).
      • Define meta-data to mark a region as indexed, and use a Solr schema to allow the user to define the fields and analyzers.
      • Integrate with the HLog to ensure that index recovery can occur properly (eg, on region server failure)
      • Mirror region splits with indexes (use Lucene's IndexSplitter?)
      • When a region is written to HDFS, also write the corresponding Lucene index to HDFS.
      • A row key will be the ID of a given Lucene document. The Lucene docstore will explicitly not be used because the document/row data is stored in HBase. We will need to solve what the best data structure for efficiently mapping a docid -> row key is. It could be a docstore, field cache, column stride fields, or some other mechanism.
      • Write unit tests for the above

      Phase 2 - Queries:

      • Enable distributed Lucene queries
      • Regions that have Lucene indexes are inherently available and may be searched on, meaning there's no need for a separate search related system in Zookeeper.
      • Integrate search with HBase's RPC mechanis

        Attachments

        1. HDFS-APPEND-0.20-LOCAL-FILE.patch
          8 kB
          Jason Rutherglen
        2. HBASE-3529.patch
          41 kB
          Jason Rutherglen

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jasonrutherglen Jason Rutherglen
              • Votes:
                37 Vote for this issue
                Watchers:
                90 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: