Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Resolved
-
None
Description
ParquetInputFormat takes parquet schema as user input but after split it reads the parquet schema again here https://github.com/apache/flink/blob/52dcf439bb0b8d613fff1efecf015052d5b3a10b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java#L170
it should read the provided user schema.
But better would be to read the schema automatically and not require the user to provide a schema as spark does (https://spark.apache.org/docs/latest/sql-data-sources-parquet.html).
Thus we could add a ParquetInputFormat constructor and allow ParquetTableSource with no schema parameter
Attachments
Issue Links
- links to