Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Enable deep cast of a tuple/bag to a tuple/bag of different inner schema

      Description

      Pig does not handle deep cast from bag -> bag, tuple -> tuple. Eg, the following script does not produce desired result:

      a = load '1.txt' as (a0:bag{t:tuple(i0:double)});
      b = foreach a generate (bag{tuple(int)})a0;
      dump b;
      

      The result tuple still contain int inside tuple of bag.

      PIG-613 fix the case we cast bytearray > bag/tuple, we take complex type including inner types, but bag>bag, tuple->tuple is still not effective.

      1. PIG-1758-2.patch
        18 kB
        Daniel Dai
      2. PIG-1758-1.patch
        18 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Dmitriy V. Ryaboy added a comment -

          In other words, it's impossible to cast a bag of tuples of bytearrays into a bag of tuples of longs without this patch.
          It simply makes the POCast operator recurse into nested structures and keep applying the caster as needed.

          Show
          Dmitriy V. Ryaboy added a comment - In other words, it's impossible to cast a bag of tuples of bytearrays into a bag of tuples of longs without this patch. It simply makes the POCast operator recurse into nested structures and keep applying the caster as needed.
          Hide
          Daniel Dai added a comment -

          Cast bytes into bag/tuple is handled by PIG-613. This Jira handles cast bag to bag, tuple to tuple.

          Show
          Daniel Dai added a comment - Cast bytes into bag/tuple is handled by PIG-613 . This Jira handles cast bag to bag, tuple to tuple.
          Hide
          Alan Gates added a comment -

          LoadCaster interface provides methods to cast complex types – but they are silently ignored by Pig

          So before this patch casting of bytearray to tuple or bag fails in 0.8? That is definitely a fix worth back porting.

          The addition of deep casting I have a hard time thinking of as a bug since we have never it in the past.

          Since this is relatively small and did not work in the past I am ok with back porting it. I do think we should add something in the release notes that says the feature is experimental.

          I have a concern in general about when we back port features and when we do not, but I will start a thread on that on the dev list.

          Show
          Alan Gates added a comment - LoadCaster interface provides methods to cast complex types – but they are silently ignored by Pig So before this patch casting of bytearray to tuple or bag fails in 0.8? That is definitely a fix worth back porting. The addition of deep casting I have a hard time thinking of as a bug since we have never it in the past. Since this is relatively small and did not work in the past I am ok with back porting it. I do think we should add something in the release notes that says the feature is experimental. I have a concern in general about when we back port features and when we do not, but I will start a thread on that on the dev list.
          Hide
          Alan Gates added a comment -

          I have some concerns on this. I want to review it more completely before I vote one way or another. If I promise to get to it later today can you hold any checkins until then? Thanks.

          Show
          Alan Gates added a comment - I have some concerns on this. I want to review it more completely before I vote one way or another. If I promise to get to it later today can you hold any checkins until then? Thanks.
          Hide
          Santhosh Srinivasan added a comment -

          Should be fine if the patch applies cleanly and all the unit tests pass. I am concerned about having patches in JIRAs that are not documented in release notes as known issues (with patches)

          Show
          Santhosh Srinivasan added a comment - Should be fine if the patch applies cleanly and all the unit tests pass. I am concerned about having patches in JIRAs that are not documented in release notes as known issues (with patches)
          Hide
          Dmitriy V. Ryaboy added a comment -

          Actually, as it turns out, this patch applies cleanly to 0.8 branch, and the tests pass.
          I think this is safe to commit.
          I'll wait a day since Olga expressed reservations in case someone wants to -1 this.

          Show
          Dmitriy V. Ryaboy added a comment - Actually, as it turns out, this patch applies cleanly to 0.8 branch, and the tests pass. I think this is safe to commit. I'll wait a day since Olga expressed reservations in case someone wants to -1 this.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Olga, I would contend that this is indeed a bug. The fact that you cannot provide a cast of a complex type is not documented anywhere in 0.8, and in fact LoadCaster interface provides methods to cast complex types – but they are silently ignored by Pig, which caused the Cassandra developers (and me) a good 3 days of debugging.

          I will post a 0.8 version of this patch tomorrow, let's look at how deep it has to go.

          Show
          Dmitriy V. Ryaboy added a comment - Olga, I would contend that this is indeed a bug. The fact that you cannot provide a cast of a complex type is not documented anywhere in 0.8, and in fact LoadCaster interface provides methods to cast complex types – but they are silently ignored by Pig, which caused the Cassandra developers (and me) a good 3 days of debugging. I will post a 0.8 version of this patch tomorrow, let's look at how deep it has to go.
          Hide
          Olga Natkovich added a comment -

          This looks like a pretty involved change and not a bug fix but a new feature. Also, I am not sure if it is completely backward compatible. I would be hesitant to backport this into 0.8 branch.

          How about just making the patch available for 0.8 branch so that they can use it with the branch if they choose so.

          Show
          Olga Natkovich added a comment - This looks like a pretty involved change and not a bug fix but a new feature. Also, I am not sure if it is completely backward compatible. I would be hesitant to backport this into 0.8 branch. How about just making the patch available for 0.8 branch so that they can use it with the branch if they choose so.
          Hide
          Dmitriy V. Ryaboy added a comment -

          The Cassandra fellows ran into needing this for their CassandraStorage implementation. Objections to backporting this into 8.1?

          Show
          Dmitriy V. Ryaboy added a comment - The Cassandra fellows ran into needing this for their CassandraStorage implementation. Objections to backporting this into 8.1?
          Hide
          Daniel Dai added a comment -

          Review notes:
          https://reviews.apache.org/r/152/

          Patch committed to trunk.

          Show
          Daniel Dai added a comment - Review notes: https://reviews.apache.org/r/152/ Patch committed to trunk.
          Hide
          Yan Zhou added a comment -

          +1

          Show
          Yan Zhou added a comment - +1
          Hide
          Daniel Dai added a comment -

          PIG-1758-2.patch address findbugs warnings.

          Show
          Daniel Dai added a comment - PIG-1758 -2.patch address findbugs warnings.

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development