Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3276

optimize union sub-queries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.10.0
    • 0.10.0
    • None
    • None

    Description

      It might be a good idea to optimize simple union queries containing map-reduce jobs in at least one of the sub-qeuries.

      For eg:

      a query like:

      insert overwrite table T1 partition P1
      select * from
      (
      subq1
      union all
      subq2
      ) u;

      today creates 3 map-reduce jobs, one for subq1, another for subq2 and
      the final one for the union.

      It might be a good idea to optimize this. Instead of creating the union
      task, it might be simpler to create a move task (or something like a move
      task), where the outputs of the two sub-queries will be moved to the final
      directory. This can easily extend to more than 2 sub-queries in the union.

      This is very useful if there is a select * followed by filesink after the
      union. This can be independently useful, and also be used to optimize the
      skewed joins –
      https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization.

      If there is a select, filter between the union and the filesink, the select
      and the filter can be moved before the union, and the follow-up job can
      still be removed.

      Attachments

        1. hive.3276.10.patch
          479 kB
          Namit Jain
        2. hive.3276.11.patch
          480 kB
          Namit Jain
        3. hive.3276.12.patch
          480 kB
          Namit Jain
        4. hive.3276.13.patch
          481 kB
          Namit Jain
        5. hive.3276.14.patch
          498 kB
          Namit Jain
        6. hive.3276.2.patch
          286 kB
          Namit Jain
        7. hive.3276.3.patch
          312 kB
          Namit Jain
        8. hive.3276.4.patch
          307 kB
          Namit Jain
        9. hive.3276.5.patch
          365 kB
          Namit Jain
        10. hive.3276.6.patch
          452 kB
          Namit Jain
        11. hive.3276.7.patch
          452 kB
          Namit Jain
        12. hive.3276.8.patch
          456 kB
          Namit Jain
        13. hive.3276.9.patch
          456 kB
          Namit Jain
        14. HIVE-3276.1.patch
          21 kB
          Nadeem Moidu

        Issue Links

          Activity

            People

              namit Namit Jain
              namit Namit Jain
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: