Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4136

hive should optimize the scenario when the input and output are bucketed/sorted on the same keys

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Processor
    • None

    Description

      Consider a common scenario like:

      create table T1 (...) clustered by (key) sorted by (key) into 2 buckets;
      create table T2 (...) clustered by (key) sorted by (key) into 2 buckets;

      SET hive.enforce.sorting=true;
      SET hive.enforce.bucketing=true;

      insert overwrite table T2
      select * from T1;

      The above query creates a reducer to make sure T2 is bucketed/sorted.
      That is not needed

      Attachments

        Activity

          People

            Unassigned Unassigned
            namit Namit Jain
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: