Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3694

Correlate messages with locations in reader/writer schema compatibility check results

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • python
    • None

    Description

      There is an issue with the class SchemaCompatibilityResult, defined in compatibility.py:

      class SchemaCompatibilityResult:
          def __init__(
              self,
              compatibility: SchemaCompatibilityType = SchemaCompatibilityType.recursion_in_progress,
              incompatibilities: Optional[List[SchemaIncompatibilityType]] = None,
              messages: Optional[Set[str]] = None,
              locations: Optional[Set[str]] = None,
          ):
              self.locations = locations or {"/"}
              self.messages = messages or set()
              self.compatibility = compatibility
              self.incompatibilities = incompatibilities or []

      Here, locations and messages are defined as python sets and therefore are unordered. When a compatibility check is made between a reader and a writer schema, the check is made recursively, and results of the above type are merged together for each incompatibility found. The problem is that locations and messages must go in pairs, while they are defined as separate attributes, and are currently merged as follows, see compatibility.py:

      def merge(this: SchemaCompatibilityResult, that: SchemaCompatibilityResult) -> SchemaCompatibilityResult:
          ...
              messages = this.messages.union(that.messages)
              locations = this.locations.union(that.locations)
          ...

      Since python sets are not ordered, it is possible to get messages that are not in sync with their locations.

      Proposed solution

      Encapsulate location and message into a simple data class (or named tuple) to keep these two pieces of information together.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            oshevtsov Oleksii Shevtsov

            Dates

              Created:
              Updated:

              Slack

                Issue deployment