Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25915

Query based MINOR compaction fails with NPE if the data is loaded into the ACID table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Hive

    Description

      Steps to reproduce:

      1.  Create a table with import:
        CREATE TABLE temp_acid(id string, value string) CLUSTERED BY(id) INTO 10 BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');
      2. insert into temp_acid values ('1','one'),('2','two'),('3','three'),('4','four'),('5','five'),('6','six'),('7','seven'),('8','eight'),('9','nine'),('10','ten'),('11','eleven'),('12','twelve'),('13','thirteen'),('14','fourteen'),('15','fifteen'),('16','sixteen'),('17','seventeen'),('18','eighteen'),('19','nineteen'),('20','twenty');
        export table temp_acid to '/tmp/temp_acid';
        import table imported from '/tmp/temp_acid';
      3. Do some inserts:
        {{insert into imported values ('21', 'value21'),('84', 'value84'),('66', 'value66'),('54', 'value54');
        insert into imported values ('22', 'value22'),('34', 'value34'),('35', 'value35');
        insert into imported values ('75', 'value75'),('99', 'value99');}}
      4. Run a minor compaction

      If the data is loaded or imported into the table they way it is described above, the rows in the ORC file don't contain the ACID metadata. The query-based MINOR compaction fails on this kind of table, because when the FileSinkOperator tries to read out the bucket metadata from the rows it will throw a NPE. But deleting and updating a table like this is possible. So somehow the bucketId can be calculated for rows like this.
      The non-query based MINOR compaction works fine on a table like this.

      Attachments

        Issue Links

          Activity

            People

              veghlaci05 László Végh
              veghlaci05 László Végh
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m