Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17138

FileSinkOperator/Compactor doesn't create empty files for acid path

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.0
    • None
    • Transactions
    • None

    Description

      For bucketed tables, FileSinkOperator is expected (in some cases) to produce a specific number of files even if they are empty.
      FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

      This doesn't property work for Acid path. For Insert, the OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the actual bucketN file (as of HIVE-14007, it does it regardless of whether RecordUpdater sees any rows). This causes empty (i.e.ORC metadata only) bucket files to be created for multiFileSpray=true if a particular FileSinkOperator.process() sees at least 1 row. For example,

      create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored as orc TBLPROPERTIES ('transactional'='true');
      insert into fourbuckets values(0,1),(1,1);
      with mapreduce.job.reduces = 1 or 2 
      

      For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that needs to land there is seen. Thus it never creates empty buckets no mater what the value of skipFiles in closeOp(boolean).

      Once Split Update does the split early (in operator pipeline) only the Insert path will matter since base and delta are the only files split computation, etc looks at. delete_delta is only for Acid internals so there is never any reason for create empty files there.

      Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            ekoifman Eugene Koifman

            Dates

              Created:
              Updated:

              Slack

                Issue deployment