Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-310

There should be a way to specify projection schema for Parquet files

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0, 0.8.2
    • Component/s: IO
    • Labels:
      None

      Description

      Currently the projection schema is set based on the ptype:

       private static <S> FormatBundle<AvroParquetInputFormat> getBundle(AvroType<S> ptype) {
          return FormatBundle.forInput(AvroParquetInputFormat.class)
              .set(AvroReadSupport.AVRO_REQUESTED_PROJECTION, ptype.getSchema().toString())
              // ParquetRecordReader expects ParquetInputSplits, not FileSplits, so it
              // doesn't work with CombineFileInputFormat
              .set(RuntimeParameters.DISABLE_COMBINE_FILE, "true");
        }
      

      Sometimes a user wants a subset of columns as a projection. Need a mechanism to supply desired projection schema.

        Attachments

        1. CRUNCH-310b.patch
          16 kB
          Josh Wills
        2. CRUNCH-310.patch
          13 kB
          Josh Wills
        3. 0001-CRUNCH-310-A-fix-for-projected-schemas.txt
          10 kB
          Alex Kozlov

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              alexvk Alex Kozlov
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: