Pig
  1. Pig
  2. PIG-2537

Output from flatten with a null tuple input generating data inconsistent with the schema

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: 0.15.0
    • Component/s: impl
    • Labels:
      None

      Description

      For the following pig script,

      grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
      grunt> B = foreach A generate flatten( $0 ), b, c;
      grunt> describe B;
      B:

      {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray}

      Alias B has a clear schema.

      However, on the backend, for a row if $0 happens to be null, then output tuple become something like
      (null, b_value, c_value), which is obviously inconsistent with the schema. The behaviour is confirmed by pig code inspection.

      This inconsistency corrupts data because of position shifts. Expected output row should be something like
      (null, null, null, b_value, c_value).

      1. PIG-2537-3.patch
        44 kB
        Daniel Dai
      2. PIG-2537-2.patch
        27 kB
        Daniel Dai
      3. PIG-2537-1.patch
        17 kB
        Daniel Dai

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Xuefu Zhang
          • Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development