Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Spark
Description
Applying Upserting kuduRdd into existing Kudu table is not clear how to apply.
You mention in the documentation under "Kudu integration with Spark":
some possible operations to perform:
***********************************************
// then we can insert data into the kudu table
df.write.options(Map("kudu.master" -> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("append").kudu
// to update existing data change the mode to 'overwrite'
df.write.options(Map("kudu.master" -> "your.kudu.master.here","kudu.table"-> "your.kudu.table.here")).mode("overwrite").kudu
****************************************************************
But there is no possibility to perform:
kuduDataFrame.write.options(Map("kudu.master" -> Kudu_Master,"kudu.table"-> TargetTable)).mode("upsert").kudu
***************************************************************
the current solution which is quit slow is:
Call DataFrame.foreachpartition
- open the table
- create session
--For each row in this partition-
- create upsert operation
- get row from the operation
- add all fields and values to this row
- perform this operation
----------------------------------
this solution is quit slow! so adding upsert mode to Dataframe writing function for Kudu tables could be better than open sessions and create operations as the previous solution.
kuduDataFrame.write.options(Map("kudu.master" -> Kudu_Master,"kudu.table"-> TargetTable)).mode("upsert").kudu
-