Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3722

Avro codegen can be unnecessarily disabled

Agile BoardAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 2.6.0
    • None
    • Backend

    Description

      We use avro_schema_equal() from the Avro C library to determine if a file's schema matches the table schema, and if they don't match we disable codegen for that file (https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-avro-scanner.cc#L153). However, avro_schema_equal() is unnecessarily restrictive, because it compares the records' names and namespaces, which don't have to be the same to enable codegen. There are probably other checks we don't need as well, e.g. default values. We should write our own schema comparison function that is tailored to what must match for codegen specifically.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            skye Skye Wanderman-Milne

            Dates

              Created:
              Updated:

              Slack

                Issue deployment