Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23956

Delete delta directory file information should be pushed to execution side

    XMLWordPrintableJSON

Details

    Description

      Since HIVE-23840 LLAP cache is used to retrieve the tail of the ORC bucket files in the delete deltas, but to use the cache the fileId must be determined, so one more FileSystem call is issued for each bucket.

      This fileId is already available during compilation in the AcidState calculation, we should serialise this to the OrcSplit, and remove the unnecessary FS calls.

      Furthermore instead of sending the SyntheticFileId directly, we should pass the attemptId instead of the standard path hash, this way the path and the SyntheticFileId. can be calculated, and it will work even, if the move free delete operations will be introduced.

      Attachments

        Issue Links

          Activity

            People

              pvargacl Peter Varga
              pvargacl Peter Varga
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 50m
                  4h 50m