Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Under occasional situations, Hudi bulk insert generates empty parquet files ( I cannot consistently reproduce it, however).
The empty parquet files cause subsequent updates fail due to ParquetUtils trying to read the footer.
In Spark, there is a property: "spark.sql.files.ignoreCorruptFiles" which handles such a case, would Hudi be able to take in this property?