Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1473

Handle empty parquet files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • storage-management
    • None

    Description

      Under occasional situations, Hudi bulk insert generates empty parquet files ( I cannot consistently reproduce it, however).

      The empty parquet files cause subsequent updates fail due to ParquetUtils trying to read the footer.

      In Spark, there is a property: "spark.sql.files.ignoreCorruptFiles" which handles such a case, would Hudi be able to take in this property?

      Attachments

        Activity

          People

            Unassigned Unassigned
            clark10chang Clark Chang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: