Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2250

[SQL] Bulk insert support for tables w/ primary key

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 0.9.0
    • None

    Description

      we want to support bulk insert for any table. Right now, we have a constraint that only tables w/o any primary key can be bulk_inserted. 

       

               > 

               > set hoodie.sql.bulk.insert.enable = true;

      hoodie.sql.bulk.insert.enable true

      Time taken: 2.019 seconds, Fetched 1 row(s)

      spark-sql> set hoodie.datasource.write.row.writer.enable = true;

      hoodie.datasource.write.row.writer.enable true

      Time taken: 0.026 seconds, Fetched 1 row(s)

      spark-sql> 

               > 

               > create table hudi_17Gb_ext1 using hudi location 's3a://siva-test-bucket-june-16/hudi_testing/gh_arch_dump/hudi_5/' options ( 

               >   type = 'cow', 

               >   primaryKey = 'randomId', 

               >   preCombineField = 'date_col' 

               >  ) 

               > partitioned by (type) as select * from gh_17Gb_date_col;

      21/07/29 04:26:15 ERROR SparkSQLDriver: Failed in [create table hudi_17Gb_ext1 using hudi location 's3a://siva-test-bucket-june-16/hudi_testing/gh_arch_dump/hudi_5/' options ( 

        type = 'cow', 

        primaryKey = 'randomId', 

        preCombineField = 'date_col' 

       ) 

      partitioned by (type) as select * from gh_17Gb_date_col]

      java.lang.IllegalArgumentException: Table with primaryKey can not use bulk insert.

      at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.buildHoodieInsertConfig(InsertIntoHoodieTableCommand.scala:219)

      at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:78)

      at org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:86)

      at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)

      at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)

      at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120)

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pzw2018 pengzhiwei
            shivnarayan sivabalan narayanan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment