Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3342

Server-side Row-level Inverted Index Join via Coprocessors

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Coprocessors
    • Labels:
      None

      Description

      A common schema in HBase is to created an inverted index per row (a la inbox search) where a row is a user/entity, each column is a word, and versions are instances of that word in documents (values can be empty or could contain additional scoring info like position / count information).

      When querying indexes like this, we may want to do something like: give me the N most recent documents that contain the word "foo" (exact word matching) and contain a word that starts with "bar" (prefix matching).

      Currently this join has to be done on the client-side, so we may have to read far more than N documents for each word to be able to get N documents which match for both words. This gets worse as the number of words increase.

      We could implement this join on the server-side in a coprocessor.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                streamy Jonathan Gray
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: