Details
-
Task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
we want to support bulk insert for any table. Right now, we have a constraint that only tables w/o any primary key can be bulk_inserted.
>
> set hoodie.sql.bulk.insert.enable = true;
hoodie.sql.bulk.insert.enable true
Time taken: 2.019 seconds, Fetched 1 row(s)
spark-sql> set hoodie.datasource.write.row.writer.enable = true;
hoodie.datasource.write.row.writer.enable true
Time taken: 0.026 seconds, Fetched 1 row(s)
spark-sql>
>
> create table hudi_17Gb_ext1 using hudi location 's3a://siva-test-bucket-june-16/hudi_testing/gh_arch_dump/hudi_5/' options (
> type = 'cow',
> primaryKey = 'randomId',
> preCombineField = 'date_col'
> )
> partitioned by (type) as select * from gh_17Gb_date_col;
21/07/29 04:26:15 ERROR SparkSQLDriver: Failed in [create table hudi_17Gb_ext1 using hudi location 's3a://siva-test-bucket-june-16/hudi_testing/gh_arch_dump/hudi_5/' options (
type = 'cow',
primaryKey = 'randomId',
preCombineField = 'date_col'
)
partitioned by (type) as select * from gh_17Gb_date_col]
java.lang.IllegalArgumentException: Table with primaryKey can not use bulk insert.
at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.buildHoodieInsertConfig(InsertIntoHoodieTableCommand.scala:219)
at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:78)
at org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:86)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:120)