Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Later
    • Affects Version/s: 0.90.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Using the Apache Lucene library we can add freetext search to HBase. The advantages of this are:

      • HBase is highly scalable and distributed
      • HBase is realtime
      • Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
      • Lucene offers many types of queries not currently available in HBase (eg, AND, OR, NOT, phrase, etc)
      • It's easier to build scalable realtime systems on top of already architecturally sound, scalable realtime data system, eg, HBase.
      • Scaling realtime search will be as simple as scaling HBase.

      Phase 1 - Indexing:

      • Integrate Lucene into HBase such that an index mirrors a given region. This means cascading add, update, and deletes between a Lucene index and an HBase region (and vice versa).
      • Define meta-data to mark a region as indexed, and use a Solr schema to allow the user to define the fields and analyzers.
      • Integrate with the HLog to ensure that index recovery can occur properly (eg, on region server failure)
      • Mirror region splits with indexes (use Lucene's IndexSplitter?)
      • When a region is written to HDFS, also write the corresponding Lucene index to HDFS.
      • A row key will be the ID of a given Lucene document. The Lucene docstore will explicitly not be used because the document/row data is stored in HBase. We will need to solve what the best data structure for efficiently mapping a docid -> row key is. It could be a docstore, field cache, column stride fields, or some other mechanism.
      • Write unit tests for the above

      Phase 2 - Queries:

      • Enable distributed Lucene queries
      • Regions that have Lucene indexes are inherently available and may be searched on, meaning there's no need for a separate search related system in Zookeeper.
      • Integrate search with HBase's RPC mechanis
      1. HDFS-APPEND-0.20-LOCAL-FILE.patch
        8 kB
        Jason Rutherglen
      2. HBASE-3529.patch
        41 kB
        Jason Rutherglen

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              Jason Rutherglen
            • Votes:
              37 Vote for this issue
              Watchers:
              87 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development