Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6455

Scalable dynamic partitioning and bucketing optimization

    XMLWordPrintableJSON

Details

    Description

      The current implementation of dynamic partition works by keeping at least one record writer open per dynamic partition directory. In case of bucketing there can be multispray file writers which further adds up to the number of open record writers. The record writers of column oriented file format (like ORC, RCFile etc.) keeps some sort of in-memory buffers (value buffer or compression buffers) open all the time to buffer up the rows and compress them before flushing it to disk. Since these buffers are maintained per column basis the amount of constant memory that will required at runtime increases as the number of partitions and number of columns per partition increases. This often leads to OutOfMemory (OOM) exception in mappers or reducers depending on the number of open record writers. Users often tune the JVM heapsize (runtime memory) to get over such OOM issues.

      With this optimization, the dynamic partition columns and bucketing columns (in case of bucketed tables) are sorted before being fed to the reducers. Since the partitioning and bucketing columns are sorted, each reducers can keep only one record writer open at any time thereby reducing the memory pressure on the reducers. This optimization is highly scalable as the number of partition and number of columns per partition increases at the cost of sorting the columns.

      Attachments

        1. HIVE-6455.1.patch
          280 kB
          Prasanth Jayachandran
        2. HIVE-6455.1.patch
          280 kB
          Prasanth Jayachandran
        3. HIVE-6455.10.patch
          427 kB
          Prasanth Jayachandran
        4. HIVE-6455.10.patch
          436 kB
          Prasanth Jayachandran
        5. HIVE-6455.11.patch
          433 kB
          Prasanth Jayachandran
        6. HIVE-6455.12.patch
          583 kB
          Prasanth Jayachandran
        7. HIVE-6455.13.patch
          582 kB
          Prasanth Jayachandran
        8. HIVE-6455.13.patch
          582 kB
          Prasanth Jayachandran
        9. HIVE-6455.14.patch
          470 kB
          Prasanth Jayachandran
        10. HIVE-6455.15.patch
          468 kB
          Prasanth Jayachandran
        11. HIVE-6455.16.patch
          693 kB
          Prasanth Jayachandran
        12. HIVE-6455.17.patch
          702 kB
          Prasanth Jayachandran
        13. HIVE-6455.17.patch.txt
          702 kB
          Prasanth Jayachandran
        14. HIVE-6455.18.patch
          701 kB
          Prasanth Jayachandran
        15. HIVE-6455.19.patch
          700 kB
          Prasanth Jayachandran
        16. HIVE-6455.2.patch
          280 kB
          Prasanth Jayachandran
        17. HIVE-6455.20.patch
          624 kB
          Prasanth Jayachandran
        18. HIVE-6455.21.patch
          650 kB
          Prasanth Jayachandran
        19. HIVE-6455.3.patch
          288 kB
          Prasanth Jayachandran
        20. HIVE-6455.4.patch
          288 kB
          Prasanth Jayachandran
        21. HIVE-6455.4.patch
          288 kB
          Prasanth Jayachandran
        22. HIVE-6455.5.patch
          296 kB
          Prasanth Jayachandran
        23. HIVE-6455.6.patch
          458 kB
          Prasanth Jayachandran
        24. HIVE-6455.7.patch
          426 kB
          Prasanth Jayachandran
        25. HIVE-6455.8.patch
          426 kB
          Prasanth Jayachandran
        26. HIVE-6455.9.patch
          418 kB
          Prasanth Jayachandran
        27. HIVE-6455.9.patch
          418 kB
          Prasanth Jayachandran

        Issue Links

          Activity

            People

              prasanth_j Prasanth Jayachandran
              prasanth_j Prasanth Jayachandran
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: