Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3502

design efficient bucketing techniques

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Processor
    • None

    Description

      Currently, the bucketing techniques are fairly expensive - The bucketing keys
      have to be the same as the reduction keys and the process of bucketization requires
      a fully blown map-reduce job.

      It should be possible to perform a map-side bucketization. The high level idea is
      to shard the data based on the number of buckets, and create a sub-directory for each
      bucket. Then, the data from all the mappers (in the same sub-directory) can be merged.
      So, instead of having 1 file per directory, it would lead to 1 directory per directory.

      Attachments

        Activity

          People

            sambavi Sambavi Muthukrishnan
            namit Namit Jain
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: