Pig
  1. Pig
  2. PIG-1627

Flattening of bags with unknown schemas produces wrong schema

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.9.0
    • Component/s: impl
    • Labels:
      None

      Description

      The following should produce an unknown schema:

      A = load '/Users/gates/test/data/studenttab10';
      B = group A by $0;
      C = foreach B generate flatten(A);
      describe C;
      

      Instead it gives

      C: {bytearray}
      

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          We use bytearray for a field with unknown type. In the case we don't even know the number of fields, we use null schema (unknown schema). Yes, some clearance in the document is needed.

          Show
          Daniel Dai added a comment - We use bytearray for a field with unknown type. In the case we don't even know the number of fields, we use null schema (unknown schema). Yes, some clearance in the document is needed.
          Hide
          Mridul Muralidharan added a comment -

          bytearray vs unknown schema use is always confusing.
          The description in https://issues.apache.org/jira/browse/PIG-1876, for example, indicates that unknown schema implies it should be bytearray (desc starts with : "Currently Pig map type is untyped, which means map value is always of bytearray(ie. unknown) type." ..), while this JIRA seems to indicate it is not the case !

          I have seen varying interpretations of what bytearray is supposed to mean in the jira's, pig docs and pig source code over the last 3+ years, not to mention in the various ilist's and user source codebass - some clarity in this regard would be good and less confusing.

          Show
          Mridul Muralidharan added a comment - bytearray vs unknown schema use is always confusing. The description in https://issues.apache.org/jira/browse/PIG-1876 , for example, indicates that unknown schema implies it should be bytearray (desc starts with : "Currently Pig map type is untyped, which means map value is always of bytearray(ie. unknown) type." ..), while this JIRA seems to indicate it is not the case ! I have seen varying interpretations of what bytearray is supposed to mean in the jira's, pig docs and pig source code over the last 3+ years, not to mention in the various ilist's and user source codebass - some clarity in this regard would be good and less confusing.
          Hide
          Daniel Dai added a comment -

          PIG-1786 checked in. Retest and now we get:
          Schema for C unknown.

          Close the Jira.

          Show
          Daniel Dai added a comment - PIG-1786 checked in. Retest and now we get: Schema for C unknown. Close the Jira.
          Hide
          Daniel Dai added a comment -

          In new logical plan, explain C we get:
          C: (Name: LOForEach Schema: null)

          Which does the right thing. Once we migrate describe to new logical plan, this should be fixed automatically.

          Show
          Daniel Dai added a comment - In new logical plan, explain C we get: C: (Name: LOForEach Schema: null) Which does the right thing. Once we migrate describe to new logical plan, this should be fixed automatically.
          Hide
          Alan Gates added a comment -

          The problem is in the flatten, not the group. The group has the proper schema (bytearray, bag{}). Loading a bag of unknown schema and flattening it produces the same result.

          Flattening a tuple of unknown content has the same problem as well.

          Show
          Alan Gates added a comment - The problem is in the flatten, not the group. The group has the proper schema (bytearray, bag{}). Loading a bag of unknown schema and flattening it produces the same result. Flattening a tuple of unknown content has the same problem as well.

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Alan Gates
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development