Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20738 Enable Delete Event filtering in VectorizedOrcAcidRowBatchReader
  3. HIVE-16812

VectorizedOrcAcidRowBatchReader doesn't filter delete events

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 4.0.0
    • Component/s: Transactions
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      n/a

      Description

      the c'tor of VectorizedOrcAcidRowBatchReader has

          // Clone readerOptions for deleteEvents.
          Reader.Options deleteEventReaderOptions = readerOptions.clone();
          // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX because
          // we always want to read all the delete delta files.
          deleteEventReaderOptions.range(0, Long.MAX_VALUE);
      

      This is suboptimal since base and deltas are sorted by ROW__ID. So for each split if base we can find min/max ROW_ID and only load events from delta that are in [min,max] range. This will reduce the number of delete events we load in memory (to no more than there in the split).

      When we support sorting on PK, the same should apply but we'd need to make sure to store PKs in ORC index

      See OrcRawRecordMerger.discoverKeyBounds()

      hive.acid.key.index in Orc footer has an index of ROW__IDs so we should know min/max easily for any file written by OrcRecordUpdater

        Attachments

        1. HIVE-16812.07.patch
          74 kB
          Eugene Koifman
        2. HIVE-16812.06.patch
          74 kB
          Eugene Koifman
        3. HIVE-16812.05.patch
          70 kB
          Eugene Koifman
        4. HIVE-16812.04.patch
          61 kB
          Eugene Koifman
        5. HIVE-16812.02.patch
          30 kB
          Eugene Koifman

          Issue Links

            Activity

              People

              • Assignee:
                ekoifman Eugene Koifman
                Reporter:
                ekoifman Eugene Koifman
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: