Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21936

Snapshot inconsistency plan execution

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.1, 1.2.1, 2.0.0, 1.2.2, 2.0.1, 2.1.0, 2.1.1, 2.2.0, 2.3.0, 3.0.0, 2.3.1, 2.3.2, 3.1.0, 3.1.1, 2.3.4, 2.3.5
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      when using snapshot from hive, there are no validation of the existence of the snapshot nor if the snapshot apply to the hive target table.

      How to reproduce :

      create two hive table backing from hbase:
       

      CREATE TABLE default.employee(rowkey string, name string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:employee" );
      
      CREATE TABLE default.work(rowkey string, company string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:work" );  

       
      Insert some stuff in the tables:
       

      INSERT INTO TABLE default.employee values("1", "Dupont");
      INSERT INTO TABLE default.work values ("c1", "ACME");

       
       
      from Hbase, create a snapshot :

      snapshot 'employee', 'mysnapshot'

       
      from beeline some sanity check

      SELECT * FROM employee;
      SELECT * FROM work;
      

      Now that the set up is done, the first bug appearing is when setting the snapshot name within hive and query another hbase table:
       

      set hive.hbase.snapshot.name=mysnapshot;
      SELECT * FROM work;

      The problem is the condition that trigger snapshot input format :

        @Override
        public Class<? extends InputFormat> getInputFormatClass() {
          if (HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME) != null) {
            LOG.debug("Using TableSnapshotInputFormat");
            return HiveHBaseTableSnapshotInputFormat.class;
          }
          LOG.debug("Using HiveHBaseTableInputFormat");
          return HiveHBaseTableInputFormat.class;
        }

      {{}}
       
      The second problem is the pushdown predicate when using the snapshot in a query more complex than a simple select :

      set hive.hbase.snapshot.name=mysnapshot;
      SELECT * FROM employee a UNION ALL SELECT * FROM employee b;

      the result is not what we expect : all the column that is not rowkey is null.
       
      As a result, we can really use the snapshot feature for use case that need analytic computation (full scan).

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jphoang Jean-Pierre Hoang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: