Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4491

Streaming Python Bytearray Bugs

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.1, 0.13.1, 0.14.1
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      While using a streaming python udf that returned a byte array we hit a couple of bugs.

      The first was:

      org.apache.pig.impl.streaming.StreamingUDFException: LINE : UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

      and the second (after fixing the first) was a null pointer exception.

      I traced the problem to two issues:

      1. In the python controller the output from the udf was being logged as a unicode string which can fail for bytearrays.

      2. Newlines in the data at the start of a response weren't being handled properly on the Java side.

      I'm attaching a patch w/ tests that fix these two issues.

        Attachments

        1. PIG-4491.patch
          6 kB
          Jeremy Karn

          Activity

            People

            • Assignee:
              jeremykarn Jeremy Karn
              Reporter:
              jeremykarn Jeremy Karn
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: