Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2905

org.apache.avro.util.Utf8 bug in `hashCode`

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.10.0
    • 1.10.1
    • java
    • None

    Description

      Bug : utf8 hash reset is not correctly done, as a result, returns different hash when the values are logically equal.

       

      cause commit : 

       https://github.com/apache/avro/commit/d5d1cd70e920f45468bf696d3668b496a9adbab3#diff-7001731cdafe4fc24f86e71c7e8a0b61

      https://github.com/apache/avro/commit/8e345c48e8f9708c27af63754965777314dac85c#diff-7001731cdafe4fc24f86e71c7e8a0b61

       

      reproduce : 

       org.apache.avro.util.Utf8 utf8 = new org.apache.avro.util.Utf8();
       utf8.set("hello");
       System.out.println(utf8.hashCode()); // 99162322
       utf8.set("hello");
       System.out.println(utf8.hashCode()); // -1538739392
      

       

      Suggested fix.

      1. get rid of `hasHash` field. In my opinion, it leads to more bug-prone by adding additional variable to take care of. `hash != 0` should be enough to check the presence of hash. Redundant computation for `0 hash` elements should be negligible.
      2. reset `hash = 0` on every mutation call.
      3. not needed, but I want to see some minor optimization like avoiding memory access by creating local variable (int hash = 0; .... this.hash = hash).

       

      Attachments

        Issue Links

          Activity

            People

              lukemin89 Kyoungha Min
              lukemin89 Kyoungha Min
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: