Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2581

HashFNV inconsistent/non-deterministic due to default platform encoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.8.1
    • 0.11
    • piggybank
    • None
    • Patch Available
    • Reviewed

    Description

      HashFNV (org/apache/pig/piggybank/evaluation/string/HashFNV) bases its computation on String.getBytes(), which uses the platform default encoding. This leads to different results on different platforms. Worse, if any character is not supported by the encoding, the behavior is completely undefined. We have observed non-deterministic behavior that seems to be caused by this.

      Suggested fix is to instead use String.getBytes("UTF-8"), which will be well-defined and consistent on every platform.

      Attachments

        1. PIG-2581.patch
          0.6 kB
          Prashant Kommireddi
        2. PIG-2581-2.patch
          0.8 kB
          Daniel Dai

        Activity

          People

            prkommireddi Prashant Kommireddi
            koda Daniel Andersson
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: