Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-2128

Schema parsing in the Java library is more permissive than the C implementation or the JSON specification

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • java
    • None

    Description

      When parsing schemas, the Java library accepts C-style comments (which are forbidden in JSON) and is unaffected by trailing garbage (parsing stops as soon as it reaches the end of the JSON structure).

      In the C library, however, comments and trailing whitspaces cause an error.

      If a schema is accepted by one language binding, it should be accepted by the other as well. The schema should also be valid JSON. It's the Java library that does not enforce this by being more permissive than it should be, so it seems that the Java implementation should be changed. However, we must also consider whether making the Java library stricter at this point would make any existing data unreadable.

      Fortunately, the schema that is written in the data files themselves is always valid JSON, even if it is based on a non-JSON-conformant schema. The reason for this is that Java library parses the schema, build an in-memory representation and then reserializes that, thereby removing comments and trailing garbage. So existing data files are not affected, only user-supplied schemas. These can be manually updated (unlike existing data files).

      The real-world use-case where this discrepancy causes problems is Hive-Impala interaction. Users can create tables in Hive by supplying an Avro schema. That schema will be associated with the whole table by getting saved in the Hive metastore. Impala also consults this metadata when accessing the table and that causes an error in the Avro C library that Impala uses. This is detailed in IMPALA-1024. In particular, this comment contains a lot of relevant information.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            zi Zoltan Ivanfi

            Dates

              Created:
              Updated:

              Slack

                Issue deployment