Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2537

Output from flatten with a null tuple input generating data inconsistent with the schema

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: None
    • Component/s: impl
    • Labels:
      None

      Description

      For the following pig script,

      grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
      grunt> B = foreach A generate flatten( $0 ), b, c;
      grunt> describe B;
      B:

      {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray}

      Alias B has a clear schema.

      However, on the backend, for a row if $0 happens to be null, then output tuple become something like
      (null, b_value, c_value), which is obviously inconsistent with the schema. The behaviour is confirmed by pig code inspection.

      This inconsistency corrupts data because of position shifts. Expected output row should be something like
      (null, null, null, b_value, c_value).

        Attachments

        1. PIG-2537-3.patch
          44 kB
          Daniel Dai
        2. PIG-2537-2.patch
          27 kB
          Daniel Dai
        3. PIG-2537-1.patch
          17 kB
          Daniel Dai

          Issue Links

            Activity

              People

              • Assignee:
                daijy Daniel Dai
                Reporter:
                xuefuz Xuefu Zhang
              • Votes:
                3 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: