Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3502

design efficient bucketing techniques

Log workAgile BoardRank to TopRank to BottomAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Currently, the bucketing techniques are fairly expensive - The bucketing keys
      have to be the same as the reduction keys and the process of bucketization requires
      a fully blown map-reduce job.

      It should be possible to perform a map-side bucketization. The high level idea is
      to shard the data based on the number of buckets, and create a sub-directory for each
      bucket. Then, the data from all the mappers (in the same sub-directory) can be merged.
      So, instead of having 1 file per directory, it would lead to 1 directory per directory.

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              sambavi Sambavi Muthukrishnan Assign to me
              Reporter:
              namit Namit Jain

              Dates

              • Created:
                Updated:

                Issue deployment