Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3110

parquet max file size not honored

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 0.11.0
    • 0.11.0
    • None

    Description

      setting hoodie.parquet.max.file.size does not get honored. 

      I still see size reaches 120Mb even though I configure max parquet size to 50MB. 

      this is happening in both row writer path and non row writer path.

       

       df.write.format("hudi").

           |         option(PRECOMBINE_FIELD_OPT_KEY, "other").

           |         option(RECORDKEY_FIELD_OPT_KEY, "id").

           |         option(PARTITIONPATH_FIELD_OPT_KEY, "type").

           |         option(OPERATION_OPT_KEY,"bulk_insert").

           |         option("hoodie.bulkinsert.shuffle.parallelism", "4").

           |         option("hoodie.parquet.max.file.size","52428800").

           |         option(TABLE_NAME, tableName).

           |         option("hoodie.datasource.write.row.writer.enable","false").

           |         mode(Overwrite).

           |         save(basePath)

       

       ls -ltr /tmp/hudi_trips_cow/PullRequestEvent

      total 754048

      rw-rr-  1 nsb  wheel  121847456 Dec 27 19:14 e199774a-ceec-47bb-883e-4e669877f778-3_1-34-192_20211227191149448.parquet

      rw-rr-  1 nsb  wheel  119741276 Dec 27 19:14 e199774a-ceec-47bb-883e-4e669877f778-4_1-34-192_20211227191149448.parquet

      rw-rr-  1 nsb  wheel  114652047 Dec 27 19:14 e199774a-ceec-47bb-883e-4e669877f778-5_1-34-192_20211227191149448.parquet

      Attachments

        Activity

          People

            shivnarayan sivabalan narayanan
            shivnarayan sivabalan narayanan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: