Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: spark-branch
    • Component/s: spark
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Looks like we don't get aggregate warning stats when using Spark as exec engine:

      ./test_harness.pl::TestDriverPig::compareScript INFO Check failed: regex match of <Encountered Warning DIVIDE_BY_ZERO 2387 time.*> expected in stderr
      
      1. PIG-5186.0.patch
        30 kB
        Adam Szita
      2. PIG-5186.1.patch
        31 kB
        Adam Szita
      3. PIG-5186.2.patch
        32 kB
        Adam Szita
      4. PIG-5186.3.patch
        32 kB
        Adam Szita

        Issue Links

          Activity

          Hide
          szita Adam Szita added a comment -

          Aggregate warnings were not supported in Spark mode yet (hence the e2e Warning test case failures). I aim to enable this now.

          In MR/Tez we use counters, and in Spark we rely on Accumulators (a means to support distributed counters).
          Pig has some builtin warning enums in PigWarning, and also supports custom warnings for user defined functions.
          This latter is problematic with Spark because you cannot register new accumulators on the backend and read their values later in the driver.

          A workaround has been implemented in my patch PIG-5186.0.patch whereas we define Map type of Accumulators (beside the Long type we already use). One for the builtin warnings, one for the custom ones. These are passed from driver to backend, where the executors can create entries in the maps or increment preexisting values.

          liyunzhang, Nandor Kollar please take look and let me know what you think.

          Show
          szita Adam Szita added a comment - Aggregate warnings were not supported in Spark mode yet (hence the e2e Warning test case failures). I aim to enable this now. In MR/Tez we use counters, and in Spark we rely on Accumulators (a means to support distributed counters). Pig has some builtin warning enums in PigWarning, and also supports custom warnings for user defined functions. This latter is problematic with Spark because you cannot register new accumulators on the backend and read their values later in the driver. A workaround has been implemented in my patch PIG-5186.0.patch whereas we define Map type of Accumulators (beside the Long type we already use). One for the builtin warnings, one for the custom ones. These are passed from driver to backend, where the executors can create entries in the maps or increment preexisting values. liyunzhang , Nandor Kollar please take look and let me know what you think.
          Hide
          szita Adam Szita added a comment -

          Added upgrade of DummyContextUDF in PIG-5186.1.patch. This will help fix HiveUDF_7 e2e test case.
          Previously this was using org.apache.hadoop.mapred.Reporter we have to update this to PigHadoopLogger which supports Spark too.

          Show
          szita Adam Szita added a comment - Added upgrade of DummyContextUDF in PIG-5186.1.patch . This will help fix HiveUDF_7 e2e test case. Previously this was using org.apache.hadoop.mapred.Reporter we have to update this to PigHadoopLogger which supports Spark too.
          Hide
          kellyzly liyunzhang added a comment -

          Adam Szita: create a review board to help review. add Rohini Palaniswamy, Daniel Dai to help review.

          Show
          kellyzly liyunzhang added a comment - Adam Szita : create a review board to help review. add Rohini Palaniswamy , Daniel Dai to help review.
          Hide
          szita Adam Szita added a comment -

          Repatched into PIG-5186.2.patch, and added ReviewBoard request

          Show
          szita Adam Szita added a comment - Repatched into PIG-5186.2.patch , and added ReviewBoard request
          Hide
          kexianda Xianda Ke added a comment -

          it is nice to make Counter/CounterGroup generic. LGTM, +1(non-binding)
          liyunzhang, Nandor also have reviewed it in RB. Please help review and commit.

          Show
          kexianda Xianda Ke added a comment - it is nice to make Counter/CounterGroup generic. LGTM, +1(non-binding) liyunzhang , Nandor also have reviewed it in RB. Please help review and commit.
          Hide
          nkollar Nandor Kollar added a comment -

          +1

          Show
          nkollar Nandor Kollar added a comment - +1
          Hide
          rohini Rohini Palaniswamy added a comment -

          +1. Committed to spark-branch. Thanks Adam Szita for adding the support.

          Show
          rohini Rohini Palaniswamy added a comment - +1. Committed to spark-branch. Thanks Adam Szita for adding the support.
          Hide
          szita Adam Szita added a comment -

          Thanks for the review and commit

          Show
          szita Adam Szita added a comment - Thanks for the review and commit

            People

            • Assignee:
              szita Adam Szita
              Reporter:
              szita Adam Szita
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development