Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22938

Investigate possibility of removing empty bucket file creation mechanism in Hive-on-MR

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • None
    • None

    Description

      As a follow-up to HIVE-22918, this ticket is to investigate whether the empty bucket file creation mechanism can be removed safely when using MR as the engine. 

      For a bucketed table of N buckets, each insert will generate N bucket files in the delta directory, regardless of how many actual buckets are written to. As an example, if a table has 500 buckets, and we insert a single record, 499 empty bucket files are generated alongside the single bucket that contains the actual data. This makes the operation substantially slower in some cases. This behaviour only seems to happen when using MR as the execution engine.

      Some components/parts of the code might depend on this behaviour though, so it needs to be verified that removing this logic does not interfere with anything.

      Attachments

        1. HIVE-22938.1.patch
          2 kB
          Marton Bod

        Issue Links

          Activity

            People

              Marton Bod Marton Bod
              Marton Bod Marton Bod
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: