Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17138

FileSinkOperator/Compactor doesn't create empty files for acid path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.0
    • None
    • Transactions
    • None

    Description

      For bucketed tables, FileSinkOperator is expected (in some cases) to produce a specific number of files even if they are empty.
      FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

      This doesn't property work for Acid path. For Insert, the OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the actual bucketN file (as of HIVE-14007, it does it regardless of whether RecordUpdater sees any rows). This causes empty (i.e.ORC metadata only) bucket files to be created for multiFileSpray=true if a particular FileSinkOperator.process() sees at least 1 row. For example,

      create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored as orc TBLPROPERTIES ('transactional'='true');
      insert into fourbuckets values(0,1),(1,1);
      with mapreduce.job.reduces = 1 or 2 
      

      For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that needs to land there is seen. Thus it never creates empty buckets no mater what the value of skipFiles in closeOp(boolean).

      Once Split Update does the split early (in operator pipeline) only the Insert path will matter since base and delta are the only files split computation, etc looks at. delete_delta is only for Acid internals so there is never any reason for create empty files there.

      Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: