Pig
  1. Pig
  2. PIG-3456

Reduce threadlocal conf access in backend for each record

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.1
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Noticed few things while browsing code

      1) DefaultTuple has a protected boolean isNull = false; which is never used. Removing this gives ~3-5% improvement for big jobs
      2) Config checking with ThreadLocal conf is repeatedly done for each record. For eg: createDataBag in POCombinerPackage. But initialized only for first time in other places like POPackage, POJoinPackage, etc.

      1. PIG-3456-1.patch
        40 kB
        Rohini Palaniswamy
      2. PIG-3456-1-no-whitespace.patch
        26 kB
        Cheolsoo Park
      3. PIG-3456-2.patch
        31 kB
        Rohini Palaniswamy
      4. PIG-3456-3.patch
        31 kB
        Rohini Palaniswamy
      5. PIG-3456-fixtest.patch
        0.4 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Dmitriy V. Ryaboy added a comment -

          Could you post a patch without the whitespace changes (for ease of review) and some microbenchmark results?

          I had some microbenchmark code in PIG-3325, that might help bootstrap you here.

          Show
          Dmitriy V. Ryaboy added a comment - Could you post a patch without the whitespace changes (for ease of review) and some microbenchmark results? I had some microbenchmark code in PIG-3325 , that might help bootstrap you here.
          Hide
          Cheolsoo Park added a comment -

          Looks like a good clean-up to me. I am uploading the no-whitespace patch incase someone else wants to take a look at it.

          +1 assuming all the tests pass. Thank you Rohini for doing this!

          Show
          Cheolsoo Park added a comment - Looks like a good clean-up to me. I am uploading the no-whitespace patch incase someone else wants to take a look at it. +1 assuming all the tests pass. Thank you Rohini for doing this!
          Hide
          Rohini Palaniswamy added a comment -

          Posted in review board as well - https://reviews.apache.org/r/17876/.

          Dmitriy V. Ryaboy,
          PIG-3730 has some analysis and stack traces on how threadlocal access of Configuration for each record processing affects performance. There are already lots of places in pig code where this was already done. I just fixed places that it was not done. I had this patch done for 0.11 earlier when Pigmix numbers for 0.11 where worse than 0.10 even when I had PIG-2923 reverted. I could not figure out the exact reason, but since I was short of time tried fixing things I came across like this one to make it equal to 0.10. The thread local fix did not actually make much difference with my Pigmix test actually, but removing the extra boolean in DefaultTuple did add around 3-5%. But according to PIG-3730, it makes a difference when there is lot of GC happening.

          Show
          Rohini Palaniswamy added a comment - Posted in review board as well - https://reviews.apache.org/r/17876/ . Dmitriy V. Ryaboy , PIG-3730 has some analysis and stack traces on how threadlocal access of Configuration for each record processing affects performance. There are already lots of places in pig code where this was already done. I just fixed places that it was not done. I had this patch done for 0.11 earlier when Pigmix numbers for 0.11 where worse than 0.10 even when I had PIG-2923 reverted. I could not figure out the exact reason, but since I was short of time tried fixing things I came across like this one to make it equal to 0.10. The thread local fix did not actually make much difference with my Pigmix test actually, but removing the extra boolean in DefaultTuple did add around 3-5%. But according to PIG-3730 , it makes a difference when there is lot of GC happening.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Added a couple minor comments. Good change overall.

          BTW not sure if you saw, but PIG-3325 addressed the bag insertion regression you saw as a side effect of PIG-2923 without sacrificing the memory and gc benefits 2923 provides, so if you still have that reverted in your build, consider un-reverting..

          Show
          Dmitriy V. Ryaboy added a comment - Added a couple minor comments. Good change overall. BTW not sure if you saw, but PIG-3325 addressed the bag insertion regression you saw as a side effect of PIG-2923 without sacrificing the memory and gc benefits 2923 provides, so if you still have that reverted in your build, consider un-reverting..
          Hide
          Rohini Palaniswamy added a comment -

          Cancelling patch as it needs to be rebased after PIG-3591

          Show
          Rohini Palaniswamy added a comment - Cancelling patch as it needs to be rebased after PIG-3591
          Hide
          Rohini Palaniswamy added a comment -

          Checked into branch-0.14 and trunk. Thanks for the review Daniel.

          Show
          Rohini Palaniswamy added a comment - Checked into branch-0.14 and trunk. Thanks for the review Daniel.
          Hide
          Daniel Dai added a comment -

          TestTezAutoParallelism is broken due to tuple size change. Commit PIG-3456-fixtest.patch to fix it.

          Show
          Daniel Dai added a comment - TestTezAutoParallelism is broken due to tuple size change. Commit PIG-3456 -fixtest.patch to fix it.
          Hide
          Rohini Palaniswamy added a comment -

          +1

          Show
          Rohini Palaniswamy added a comment - +1

            People

            • Assignee:
              Rohini Palaniswamy
              Reporter:
              Rohini Palaniswamy
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development