Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21936

Snapshot inconsistency plan execution



    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.1, 1.2.1, 1.2.2, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.2.0, 2.3.0, 2.3.1, 2.3.2, 2.3.4, 2.3.5, 3.1.0, 3.0.0, 3.1.1
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:


      when using snapshot from hive, there are no validation of the existence of the snapshot nor if the snapshot apply to the hive target table.

      How to reproduce :

      create two hive table backing from hbase:

      CREATE TABLE default.employee(rowkey string, name string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:employee" );
      CREATE TABLE default.work(rowkey string, company string)
      STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string", "hbase.table.name"= "default:work" );  

      Insert some stuff in the tables:

      INSERT INTO TABLE default.employee values("1", "Dupont");
      INSERT INTO TABLE default.work values ("c1", "ACME");

      from Hbase, create a snapshot :

      snapshot 'employee', 'mysnapshot'

      from beeline some sanity check

      SELECT * FROM employee;
      SELECT * FROM work;

      Now that the set up is done, the first bug appearing is when setting the snapshot name within hive and query another hbase table:

      set hive.hbase.snapshot.name=mysnapshot;
      SELECT * FROM work;

      The problem is the condition that trigger snapshot input format :

        public Class<? extends InputFormat> getInputFormatClass() {
          if (HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME) != null) {
            LOG.debug("Using TableSnapshotInputFormat");
            return HiveHBaseTableSnapshotInputFormat.class;
          LOG.debug("Using HiveHBaseTableInputFormat");
          return HiveHBaseTableInputFormat.class;

      The second problem is the pushdown predicate when using the snapshot in a query more complex than a simple select :

      set hive.hbase.snapshot.name=mysnapshot;
      SELECT * FROM employee a UNION ALL SELECT * FROM employee b;

      the result is not what we expect : all the column that is not rowkey is null.
      As a result, we can really use the snapshot feature for use case that need analytic computation (full scan).




            • Assignee:
              jphoang Jean-Pierre Hoang
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: