Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4821

Pig chararray field with special UTF-8 chars as part of tuple join key produces wrong results in Tez

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0, 0.15.1
    • None
    • None
    • Reviewed

    Description

      SedesHelper.writeChararray does writeUTF, but we do str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8); when reading it in the BinInterSedesTupleRawComparator https://github.com/apache/pig/blob/e0c5f265c68491395d8303c86195445be3d8aecf/src/org/apache/pig/data/BinInterSedes.java#L959-L964. For some reason, this works fine in my MAC (both jdk7 and jdk8) but not in Linux. Not sure about the actual cause and have not dug into it. Suspecting either charset environment or the specific update of jdk 8 (different in my MAC and Linux).

      Attachments

        1. PIG-4821-1.patch
          2 kB
          Rohini Palaniswamy

        Issue Links

          Activity

            People

              rohini Rohini Palaniswamy
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: