Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21831

Stats should be reset correctly during load of a partitioned ACID table

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      While running something similar to the following example, I noticed that an import of a partitioned ACID table using the ORC format fails to provide table statistics:

      set hive.stats.autogather=true;
      set hive.stats.column.autogather=true;
      set hive.fetch.task.conversion=none;
      
      
      set hive.support.concurrency=true;
      set hive.default.fileformat.managed=ORC;
      set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
      
      
      create transactional table int_src (foo int, bar int);
      insert into int_src select 1,1;
      
      
      create transactional table int_exp(foo int) partitioned by (bar int);
      insert into int_exp select * from int_src;
      select count(*) from int_exp;
      
      
      create transactional table int_imp(foo int) partitioned by (bar int);
      
      
      EXPORT TABLE int_exp to '/tmp/expint';
      IMPORT TABLE int_imp FROM '/tmp/expint';
      
      
      select count(*) FROM int_imp;
      

      The count returned 0 (opposed to 1, but even for 100k order of records it was 0) and correct statistics were only available after running compute statistics.

       

      This was unique to ACID + partitioning + ORC, but this isn't the expected behavior.

      Attachments

        1. HIVE-21831.02.patch
          3 kB
          David Lavati
        2. HIVE-21831.02.patch
          3 kB
          David Lavati
        3. HIVE-21831.01.patch
          5 kB
          David Lavati

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dlavati David Lavati Assign to me
            dlavati David Lavati
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 0.5h
              0.5h

              Slack

                Issue deployment