Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4821

Pig chararray field with special UTF-8 chars as part of tuple join key produces wrong results in Tez

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0, 0.15.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      SedesHelper.writeChararray does writeUTF, but we do str1 = new String(bb1.array(), bb1.position(), casz1, BinInterSedes.UTF8); when reading it in the BinInterSedesTupleRawComparator https://github.com/apache/pig/blob/e0c5f265c68491395d8303c86195445be3d8aecf/src/org/apache/pig/data/BinInterSedes.java#L959-L964. For some reason, this works fine in my MAC (both jdk7 and jdk8) but not in Linux. Not sure about the actual cause and have not dug into it. Suspecting either charset environment or the specific update of jdk 8 (different in my MAC and Linux).

        Attachments

        1. PIG-4821-1.patch
          2 kB
          Rohini Palaniswamy

          Issue Links

            Activity

              People

              • Assignee:
                rohini Rohini Palaniswamy
                Reporter:
                rohini Rohini Palaniswamy
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: