Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11525

Bucket pruning

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.13.0, 0.13.1, 0.14.0, 1.0.0, 1.1.0, 1.2.0, 1.3.0, 2.0.0
    • 2.0.0
    • Logical Optimizer
    • Tez bucket pruning

    Description

      Logically and functionally bucketing and partitioning are quite similar - both provide mechanism to segregate and separate the table's data based on its content. Thanks to that significant further optimisations like [partition] PRUNING or [bucket] MAP JOIN are possible.
      The difference seems to be imposed by design where the PARTITIONing is open/explicit while BUCKETing is discrete/implicit.
      Partitioning seems to be very common if not a standard feature in all current RDBMS while BUCKETING seems to be HIVE specific only.
      In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT PARTITIONING".

      Regardless of the fact that these two are recognised as two separate features available in Hive there should be nothing to prevent leveraging same existing query/join optimisations across the two.

      BUCKET pruning
      Enable partition PRUNING equivalent optimisation for queries on BUCKETED tables

      Simplest example is for queries like:
      "SELECT … FROM x WHERE colA=123123"
      to read only the relevant bucket file rather than all file-buckets that belong to a table.

      Attachments

        1. HIVE-11525.WIP.patch
          27 kB
          Takuya Fukudome
        2. HIVE-11525.1.patch
          114 kB
          Gopal Vijayaraghavan
        3. HIVE-11525.2.patch
          195 kB
          Gopal Vijayaraghavan
        4. HIVE-11525.3.patch
          193 kB
          Gopal Vijayaraghavan

        Issue Links

          Activity

            People

              gopalv Gopal Vijayaraghavan
              mkoc Maciek Kocon
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: