Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-6168

Connect Schema comparison is slow for large schemas

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.0
    • Component/s: KafkaConnect
    • Labels:
      None

      Description

      The ConnectSchema implementation computes the hash code every time its needed, and equals(Object) is a deep equality check. This extra work can be expensive for large schemas, especially in code like the AvroConverter (or rather AvroData in the converter) that uses instances as keys in a hash map that then requires significant use of hashCode and equals.

      The ConnectSchema is an immutable object and should at a minimum precompute the hash code. Also, the order that the fields are compared in equals(...) should use the cheapest comparisons first (e.g., the name field is one of the last fields to be checked). Finally, it might be worth considering having each instance precompute and cache a string or byte[] representation of all fields that can be used for faster equality checking.

        Attachments

        1. 6168.v1.txt
          2 kB
          Ted Yu

          Issue Links

            Activity

              People

              • Assignee:
                tedyu Ted Yu
                Reporter:
                rhauch Randall Hauch
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: