Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7956

When inserting into a bucketed table, all data goes to a single bucket [Spark Branch]

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Spark
    • None

    Description

      I created a bucketed table:

      create table testBucket(x int,y string) clustered by(x) into 10 buckets;
      

      Then I run a query like:

      set hive.enforce.bucketing = true;
      insert overwrite table testBucket select intCol,stringCol from src;
      

      Here src is a simple textfile-based table containing 40000000 records (not bucketed). The query launches 10 reduce tasks but all the data goes to only one of them.

      Attachments

        Issue Links

          Activity

            People

              lirui Rui Li
              lirui Rui Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: