Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9225

Windowing functions are not executing efficiently when the window is identical

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 0.14.1
    • PTF-Windowing
    • None
    • Linux

    • Streaming mode improves performance in such use cases.

    Description

      Hive optimizer and the runtime are not smart enough to recognize if the windowing is the same. Even when the window is identical, the windowing is re-executed again and cause the runtime increase proportionally to # of windows.

      Example:

      select code,min(emp) over (partition by code order by emp  range between current row and 300000000 following)from sample_big limit 10;
      

      Time taken: 1h:36m:12s

      select code,
      min(emp) over (partition by code order by emp  range between current row and 300000000 following),
      max(emp) over (partition by code order by emp  range between current row and 300000000 following),
      min(salary) over (partition by code order by emp  range between current row and 300000000 following),
      max(salary) over (partition by code order by emp  range between current row and 300000000 following)
      from sample_big limit 10;
      

      Time taken: 4h:0m:37s

      Attachments

        Activity

          People

            Unassigned Unassigned
            yalovyyi Illya Yalovyy
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: