Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-1428

Schema.computeHash() to add if check to avoid unnecessary hashcode computation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • java
    • None

    Description

      In current Schma.java we have following implementation:
      public final int hashCode()

      { if (hashCode == NO_HASHCODE) hashCode = computeHash(); return hashCode; }

      int computeHash()

      { return getType().hashCode() + props.hashCode(); }

      While hashCode is doing the checking of "if (hashCode == NO_HASHCODE)", the computeHash method is not. But the computeHash method is being called from Schema$Field.hashCode and the subclasses hashCode implementation like following:
      public int hashCode()

      { return name.hashCode() + schema.computeHash(); }

      //this is from Schema$Field class

      This is causing the the calculation of hashCode getting called unnecessarily extensively. The proposed changed is to add the "if" check inside the computeHash method instead:

      int computeHash()
      {
      if (hashCode == NO_HASHCODE)

      { hashCode = getType().hashCode() + props.hashCode(); }

      return hashCode;
      }

      We did a simple test to compare the performance difference, below is a summary of the heap snapshot of comparing the difference:

      As a test I wrote a small program that creates a HashMap<Schema.Field, Integer>() and enters a loop simply identifying whether various Schema.Field instances are keys in the map. Obviously this is a pathological test case, but when running with the current implementation of Schema.Field it has (in about 30 seconds) used up nearly 8 GBytes of heap in instantiating intermediate objects associated with calling Schema.computeHash():

      Heap
      PSYoungGen total 17432576K, used 8666481K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000)
      eden space 14942208K, 58% used [0x0000000340000000,0x0000000550f5c650,0x00000006d0000000)
      from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000)
      to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
      ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000)
      object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000)
      PSPermGen total 21504K, used 5782K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000)
      object space 21504K, 26% used [0x00000002fae00000,0x00000002fb3a5818,0x00000002fc300000)

      When running with the modified implementation (and no other change) all the object allocation vanishes:

      Heap
      PSYoungGen total 17432576K, used 896532K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000)
      eden space 14942208K, 6% used [0x0000000340000000,0x0000000376b852d0,0x00000006d0000000)
      from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000)
      to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
      ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000)
      object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000)
      PSPermGen total 21504K, used 5768K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000)
      object space 21504K, 26% used [0x00000002fae00000,0x00000002fb3a2240,0x00000002fc300000)

      As a side-effect the test runs x3 faster with the modified hashCode() implementation.

      Attachments

        1. AVRO-1428.patch
          0.3 kB
          Tie Liu
        2. AVRO-1428.patch-V2
          0.9 kB
          Tie Liu
        3. TestAvro.java
          4 kB
          Tie Liu

        Activity

          People

            Unassigned Unassigned
            tonyliu73 Tie Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: