Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6168

Connect Schema comparison is slow for large schemas

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.0
    • 1.1.0
    • connect
    • None

    Description

      The ConnectSchema implementation computes the hash code every time its needed, and equals(Object) is a deep equality check. This extra work can be expensive for large schemas, especially in code like the AvroConverter (or rather AvroData in the converter) that uses instances as keys in a hash map that then requires significant use of hashCode and equals.

      The ConnectSchema is an immutable object and should at a minimum precompute the hash code. Also, the order that the fields are compared in equals(...) should use the cheapest comparisons first (e.g., the name field is one of the last fields to be checked). Finally, it might be worth considering having each instance precompute and cache a string or byte[] representation of all fields that can be used for faster equality checking.

      Attachments

        1. 6168.v1.txt
          2 kB
          Ted Yu

        Issue Links

          Activity

            People

              tedyu Zhihong Yu
              rhauch Randall Hauch
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: