Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3502

design efficient bucketing techniques

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Currently, the bucketing techniques are fairly expensive - The bucketing keys
      have to be the same as the reduction keys and the process of bucketization requires
      a fully blown map-reduce job.

      It should be possible to perform a map-side bucketization. The high level idea is
      to shard the data based on the number of buckets, and create a sub-directory for each
      bucket. Then, the data from all the mappers (in the same sub-directory) can be merged.
      So, instead of having 1 file per directory, it would lead to 1 directory per directory.

        Attachments

          Activity

            People

            • Assignee:
              sambavi Sambavi Muthukrishnan
              Reporter:
              namit Namit Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: