Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-480

AvroParquetFileSource doesn't properly configure user-supplied read schema

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.12.0
    • Component/s: IO
    • Labels:
      None

      Description

      It seems like AvroParquetFileSource doesn't properly set the configuration param required to use a user-supplied read schema that differs from the schema in the file.

      Deep in the guts of Parquet (InternalParquetReader#initialize()), I found this:

         this.recordConverter = readSupport.prepareForRead(
              configuration, extraMetadata, fileSchema,
              new ReadSupport.ReadContext(requestedSchema, readSupportMetadata));
      

      Later, in Parquet's AvroReadSupport#prepareForRead(), it appears to ignore the supplied requestedSchema and, instead, looks for the key avro.read.schema in the readSupportMetadata map. This is seriously kookie code in Parquet (i.e. wrong), but because Crunch doesn't supply readSupportMetadata, we can never properly supply a read schema. Boooo hisssss.

        Attachments

        1. CRUNCH-480.3.patch
          14 kB
          Josh Wills
        2. CRUNCH-480.2.patch
          16 kB
          Gabriel Reid
        3. CRUNCH-480.1.patch
          14 kB
          Gabriel Reid
        4. CRUNCH-480.patch
          5 kB
          Josh Wills

          Activity

            People

            • Assignee:
              gabriel.reid Gabriel Reid
              Reporter:
              esammer E. Sammer
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: