Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21389

ParquetInputFormat should not need parquet schema as user input

    XMLWordPrintableJSON

Details

    Description

      ParquetInputFormat takes parquet schema as user input but after split it reads the parquet schema again here https://github.com/apache/flink/blob/52dcf439bb0b8d613fff1efecf015052d5b3a10b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java#L170
      it should read the provided user schema.
      But better would be to read the schema automatically and not require the user to provide a schema as spark does (https://spark.apache.org/docs/latest/sql-data-sources-parquet.html).
      Thus we could add a ParquetInputFormat constructor and allow ParquetTableSource with no schema parameter

      Attachments

        Issue Links

          Activity

            People

              echauchot Etienne Chauchot
              echauchot Etienne Chauchot
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: