Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21792

Hive Indexes... Again

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Indexing
    • Labels:
      None

      Description

      Hive had an implementation of indexing that was made somewhat obsolete given the introduction of columnar file formats with their own internal indexing.

      I propose that Hive introduce Indexing again.

      1. Column Index: Stored in HBase
      2. Full-Text Index: Stored in Solr

      The basic idea is that, the key in HBase is the record and the value is the relative file path of the data in the Hive table.

      Performing an INSERT statement creates the index for each record.

      https://dev.mysql.com/doc/refman/8.0/en/create-index.html

      When generating the explain plan, only the files involved in the query are considered.

      This would prevents having to scan large amounts of data for the typical BI tools when the set of data is known to be very small.

      -- Quick retrieval of small sets of records
      select * from user where userid=27;
      
      -- Full scans
      select count(1) from user;
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              belugabehr David Mollitor
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: