Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2537

Output from flatten with a null tuple input generating data inconsistent with the schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.8.0, 0.9.0
    • None
    • impl
    • None

    Description

      For the following pig script,

      grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c );
      grunt> B = foreach A generate flatten( $0 ), b, c;
      grunt> describe B;
      B:

      {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray}

      Alias B has a clear schema.

      However, on the backend, for a row if $0 happens to be null, then output tuple become something like
      (null, b_value, c_value), which is obviously inconsistent with the schema. The behaviour is confirmed by pig code inspection.

      This inconsistency corrupts data because of position shifts. Expected output row should be something like
      (null, null, null, b_value, c_value).

      Attachments

        1. PIG-2537-1.patch
          17 kB
          Daniel Dai
        2. PIG-2537-2.patch
          27 kB
          Daniel Dai
        3. PIG-2537-3.patch
          44 kB
          Daniel Dai

        Issue Links

          Activity

            People

              daijy Daniel Dai
              xuefuz Xuefu Zhang
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: