Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4227

Streaming Python UDF handles bag outputs incorrectly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None
    • None
    • Reviewed

    Description

      I have a udf that generates different outputs when running as jython and streaming python.

      jython
      {([[BBC Worldwide]])}
      
      streaming python
      {(BC Worldwid)}
      

      The problem is that streaming python encodes a bag output incorrectly. For this particular example, it serializes the output string as follows-

      |{_[[BBC Worldwide]]|}_
      

      where '|' and '_' wrap bag delimiters '{' and '}'. i.e. '{' => '|{_' and '}' => '|}_'.

      But this is wrong because bag must contain tuples not chararrays. i.e. the correct encoding is as follows-

      |{_|(_[[BBC Worldwide]]|)_|}_
      

      where '|' and '_' wrap tuple delimiters '(' and ')' as well as bag delimiters.

      This results in truncated outputs.

      Attachments

        1. PIG-4227-1.patch
          0.7 kB
          Cheolsoo Park
        2. PIG-4227-2.patch
          1 kB
          Daniel Dai

        Activity

          People

            cheolsoo Cheolsoo Park
            cheolsoo Cheolsoo Park
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: