Hive
  1. Hive
  2. HIVE-7062

Support Streaming mode in Windowing

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:

      Description

      1. Have the Windowing Table Function support streaming mode.
      2. Have special handling for Ranking UDAFs.
      3. Have special handling for Sum/Avg for fixed size Wdws.

      1. HIVE-7062.6.patch
        97 kB
        Harish Butani
      2. HIVE-7062.5.patch
        97 kB
        Harish Butani
      3. HIVE-7062.4.patch
        96 kB
        Harish Butani
      4. HIVE-7062.1.patch
        59 kB
        Harish Butani

        Issue Links

          Activity

          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
          Hide
          Lefty Leverenz added a comment -

          Doc note: HIVE-7143 & HIVE-7344 also need documentation related to this issue – min/max, lead/lag, fval/lval (HIVE-7143) & FirstVal, LastVal (HIVE-7344).

          Show
          Lefty Leverenz added a comment - Doc note: HIVE-7143 & HIVE-7344 also need documentation related to this issue – min/max, lead/lag, fval/lval ( HIVE-7143 ) & FirstVal, LastVal ( HIVE-7344 ).
          Hide
          Lefty Leverenz added a comment -

          Okay, thanks Harish Butani. I've put this with my doc-by-0.14 tasks.

          Show
          Lefty Leverenz added a comment - Okay, thanks Harish Butani . I've put this with my doc-by-0.14 tasks.
          Hide
          Harish Butani added a comment -

          I don't see a need a separate page for Streaming. How about adding a note on the Windowing and Analytics page.
          This is an implementation improvement, not a functional change.

          Yes agreed, windowing documentation can be expanded. The Oracle one is really nice: http://docs.oracle.com/cd/B14117_01/server.101/b10736/analysis.htm
          Don't know when I am going to get around to it though.

          Show
          Harish Butani added a comment - I don't see a need a separate page for Streaming. How about adding a note on the Windowing and Analytics page. This is an implementation improvement, not a functional change. Yes agreed, windowing documentation can be expanded. The Oracle one is really nice: http://docs.oracle.com/cd/B14117_01/server.101/b10736/analysis.htm Don't know when I am going to get around to it though.
          Hide
          Lefty Leverenz added a comment -

          Harish Butani, will streaming have its own wikidoc for 0.14.0 or should this just be mentioned in a new section for Windowing & Analytics?

          It could also be mentioned in Configuration Properties, but that buries the information. It could go in hive-default.xml.template too.

          By the way, Windowing & Analytics is a very skimpy doc. It links to the spec but that information should be merged into the wiki and updated.

          Quick ref:

          Show
          Lefty Leverenz added a comment - Harish Butani , will streaming have its own wikidoc for 0.14.0 or should this just be mentioned in a new section for Windowing & Analytics? It could also be mentioned in Configuration Properties, but that buries the information. It could go in hive-default.xml.template too. By the way, Windowing & Analytics is a very skimpy doc. It links to the spec but that information should be merged into the wiki and updated. Quick ref: Windowing and Analytics Functions Configuration Properties: hive.join.cache.size Windowing Specifications in HQL
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Harish!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Harish!
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12647746/HIVE-7062.6.patch

          ERROR: -1 due to 8 failed/errored test(s), 5496 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
          org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults
          org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/345/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/345/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-345/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 8 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12647746

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12647746/HIVE-7062.6.patch ERROR: -1 due to 8 failed/errored test(s), 5496 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.metastore.TestMetastoreVersion.testDefaults org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/345/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/345/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-345/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed This message is automatically generated. ATTACHMENT ID: 12647746
          Hide
          Harish Butani added a comment -

          add check to not allow streaming when there are Lead/Lag invocations in arguments.

          Show
          Harish Butani added a comment - add check to not allow streaming when there are Lead/Lag invocations in arguments.
          Hide
          Ashutosh Chauhan added a comment -

          LGTM +1

          Show
          Ashutosh Chauhan added a comment - LGTM +1
          Hide
          Harish Butani added a comment -

          Lefty Leverenz documentation note:

          One of the factors checked for processing Analytic functions in Streaming mode is the 'Window size'
          For Streaming mode to kick in window size must be less than the config parameter 'hive.join.cache.size'. Default value for this parameter is 25000.

          Show
          Harish Butani added a comment - Lefty Leverenz documentation note: One of the factors checked for processing Analytic functions in Streaming mode is the 'Window size' For Streaming mode to kick in window size must be less than the config parameter 'hive.join.cache.size'. Default value for this parameter is 25000.
          Hide
          Harish Butani added a comment -

          addressed Ashutosh Chauhan review comments

          Show
          Harish Butani added a comment - addressed Ashutosh Chauhan review comments
          Hide
          Ashutosh Chauhan added a comment -

          Mostly looks good. Some minor comments on RB

          Show
          Ashutosh Chauhan added a comment - Mostly looks good. Some minor comments on RB
          Hide
          Harish Butani added a comment - - edited

          Has Framework changes + Streaming for Sum and Avg functions + Streaming for Ranking functions.
          Still need to do Streaming for Min, Max, Lead, Lag, FirstVal, LastVal

          Show
          Harish Butani added a comment - - edited Has Framework changes + Streaming for Sum and Avg functions + Streaming for Ranking functions. Still need to do Streaming for Min, Max, Lead, Lag, FirstVal, LastVal
          Hide
          Harish Butani added a comment -

          preliminary patch attached

          Show
          Harish Butani added a comment - preliminary patch attached

            People

            • Assignee:
              Harish Butani
              Reporter:
              Harish Butani
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development