Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12963

LIMIT statement with SORT BY creates additional MR job with hardcoded only one reducer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.13.0, 1.0.0, 1.2.1
    • 2.1.0
    • Hive
    • Patch

    Description

      I execute query:

      hive> select age from test1 sort by age.age limit 10;
      Total jobs = 2
      Launching Job 1 out of 2
      Number of reduce tasks not specified. Estimated from input data size: 1
      Launching Job 2 out of 2
      Number of reduce tasks determined at compile time: 1

      When I have a large number of rows then the last stage of the job takes a long time. I think we could allow to user choose number of reducers of last job or refuse extra MR job.

      The same behavior I observed with querie:

      hive> create table new_test as select age from test1 group by age.age limit 10;

      Attachments

        1. HIVE-12963.1.patch
          2 kB
          Alina Abramova
        2. HIVE-12963.2.patch
          2 kB
          Alina Abramova
        3. HIVE-12963.3.patch
          2 kB
          Alina Abramova
        4. HIVE-12963.4.patch
          93 kB
          Alina Abramova
        5. HIVE-12963.6.patch
          3 kB
          Alina Abramova

        Activity

          People

            alina.abramova Alina Abramova
            alina.abramova Alina Abramova
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: