Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-310

There should be a way to specify projection schema for Parquet files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.9.0, 0.8.2
    • IO
    • None

    Description

      Currently the projection schema is set based on the ptype:

       private static <S> FormatBundle<AvroParquetInputFormat> getBundle(AvroType<S> ptype) {
          return FormatBundle.forInput(AvroParquetInputFormat.class)
              .set(AvroReadSupport.AVRO_REQUESTED_PROJECTION, ptype.getSchema().toString())
              // ParquetRecordReader expects ParquetInputSplits, not FileSplits, so it
              // doesn't work with CombineFileInputFormat
              .set(RuntimeParameters.DISABLE_COMBINE_FILE, "true");
        }
      

      Sometimes a user wants a subset of columns as a projection. Need a mechanism to supply desired projection schema.

      Attachments

        1. CRUNCH-310b.patch
          16 kB
          Josh Wills
        2. CRUNCH-310.patch
          13 kB
          Josh Wills
        3. 0001-CRUNCH-310-A-fix-for-projected-schemas.txt
          10 kB
          Alex Kozlov

        Activity

          People

            Unassigned Unassigned
            alexvk Alex Kozlov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: