Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21936

Snapshot inconsistency plan execution

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.1, 1.2.1, 1.2.2, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2, 2.3.4, 2.3.5, 3.1.0, 3.0.0, 3.1.1
    • None
    • HBase Handler
    • None

    Description

      when using snapshot from hive, there are no validation of the existence of the snapshot nor if the snapshot apply to the hive target table.

      How to reproduce :

      create two hive table backing from hbase:
       

      CREATE TABLE default.employee(rowkey string, name string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:employee" );
      
      CREATE TABLE default.work(rowkey string, company string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:work" );  

       
      Insert some stuff in the tables:
       

      INSERT INTO TABLE default.employee values("1", "Dupont");
      INSERT INTO TABLE default.work values ("c1", "ACME");

       
       
      from Hbase, create a snapshot :

      snapshot 'employee', 'mysnapshot'

       
      from beeline some sanity check

      SELECT * FROM employee;
      SELECT * FROM work;
      

      Now that the set up is done, the first bug appearing is when setting the snapshot name within hive and query another hbase table:
       

      set hive.hbase.snapshot.name=mysnapshot;
      SELECT * FROM work;

      The problem is the condition that trigger snapshot input format :

        @Override
        public Class<? extends InputFormat> getInputFormatClass() {
          if (HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME) != null) {
            LOG.debug("Using TableSnapshotInputFormat");
            return HiveHBaseTableSnapshotInputFormat.class;
          }
          LOG.debug("Using HiveHBaseTableInputFormat");
          return HiveHBaseTableInputFormat.class;
        }

      {{}}
       
      The second problem is the pushdown predicate when using the snapshot in a query more complex than a simple select :

      set hive.hbase.snapshot.name=mysnapshot;
      SELECT * FROM employee a UNION ALL SELECT * FROM employee b;

      the result is not what we expect : all the column that is not rowkey is null.
       
      As a result, we can really use the snapshot feature for use case that need analytic computation (full scan).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            jphoang Jean-Pierre Hoang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment