Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20738 Enable Delete Event filtering in VectorizedOrcAcidRowBatchReader
  3. HIVE-17231

ColumnizedDeleteEventRegistry.DeleteReaderValue optimization

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • Transactions
    • None
    • n/a

    Description

      For unbucketed tables DeleteReaderValue will currently return all delete events. Once we trust that
      the N in bucketN for "base" spit is reliable, all delete events not matching N can be skipped.

      This is useful to protect against extreme cases where someone runs an update/delete on a partition that matches 10 billion rows thus generates very many delete events.

      Since HIVE-19890 all acid data files must have bucketid/writerid in the file name match bucketid/writerid in ROW__ID in the data.

      OrcRawRecrodMerger.getDeltaFiles() should only return files representing the right bucket

      Attachments

        1. HIVE-17231.02.patch
          7 kB
          Eugene Koifman
        2. HIVE-17231.01.patch
          3 kB
          Eugene Koifman

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: