Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4368

Make sure Hudi always does bulk-insert during the first commit into the table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • 1.1.0
    • performance, writer-core
    • None
    • 2

    Description

      As a follow-up to the recent discussions in the community regarding out-of-the-box configuration (DB blog), i think we should adjust some aspects of our OOB configuration to stay in-line with other formats as it's inevitable that people would be comparing Hudi's performance against Delta and Iceberg:

      For example, we should make sure that whenever someone is creating a table from scratch we always use "bulk_insert" instead of "upsert" as there's no reason for us to incur the overhead of upserting since we know the table was empty.

       

      It could roughly go as following:

      • If the table is empty, and
      • There's no explicit operation configured, and
      • There's no pre-combining configured

      Then we treat it as "bulk_insert" case 

      Attachments

        Activity

          People

            xushiyan Shiyan Xu
            alexey.kudinkin Alexey Kudinkin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: